Jul 23, 2017 · 1 min read
What’s the configured batchInterval and blockInterval? If there is a single partition of data, async might not be any better.
You can also verify if the stages are actually being executed in parallel — check the spark UI, see if the stages of a job (where a job corresponds to a micro-batch) have the same start time.
If the stages have the same start time, but still complete in a staggered way — their tasks might all be getting queued on the same executor. Make sure you set the data locality interval to 0 seconds.
Did you also try out the concurrent jobs setting? That’s the biggest weapon in the arsenal to reduce queueing.
