Got ETL? Meet the reader, processor, writer pattern. Along with all the pre-built implementations, scheduling, chunking and retry features you might need.
I think those who are drawn to Spring Batch are right to use it. It's paradigm is sensible and encourages developers to design well and not to reinvent things. It is reliable, robust, and relatively easy to use.
I have found, however, that many times people reach for Spring Batch as an excellent technical solution but completely miss the business impact. ETL jobs suck!
Many businesses totally neglect the repercussions of copying and renaming data between all the systems in their company. To me, this kind of ETL reinforces bad data practices, delays meaningful standardization, and inhibits future analytics work. Extensive data duplication leads to data quality issues which often leads to added costs for master data management solutions. It's also very wasteful, especially if you are paying for lots of Oracle products. Don’t be that company that names tables in hipster speak (USR_CNTCT_INF) to cut down on data waste and then ETL everything all over your company!
If you can get the job done with Spring Batch, you can get the job done in a similar read-process-write paradigm in just about any messaging framework. This encourages loose coupling and enables continuous data transfer (I avoid using the term real-time so as not to confuse with actual real-time systems). If you really miss the pre-built readers and writers, take a look at Apache Camel.
Many features are easy to replicate in a streaming/messaging system. Scheduling? Who cares, it streaming. Retry? Make a retry queue. Failure and error handling? Dead letter queue. Partitioning? Just add more workers. The last example is actually much easier than in Spring Batch.
There's also a whole host of stream processing and analytics capabilities you probably want out of your batch job but can't make sense of. Say you have to data loads that somehow need to be related. You now need a third batch job to do a join after the first two run. Plus this all requires scheduling or polling and much consideration over efficiency.
Please don't be scared away from streaming or message systems. Please do not use Spring Batch solely because you are already doing badly at making batch jobs. Make the leap toward messaging.
I don't want to disparage Spring Batch, it's great at what it does. I have just seen too many batch jobs that would be significantly better as streaming architectures. There's also a huge push these days behind event-streaming which is a topic for another post.
Edit: I had completely forgotten that there is a relatively new area of the industry around “data engineering” to support data scientists and analysts. ETL to data engineers is more like bash scripts to most coders. It ought to work well just once or maybe periodically but generally doesn't require much effort. Data engineers help get huge amounts of data around their companies and will use Python or SQL or whatever gets the job done.
There's also a completely different scale of “batch” jobs which is where tools like Spark, Flink, and MapReduce come in. These tools can also be used for significantly more complex processing and analysis in addition to just moving data around.