Venkatesh Iyer
Jul 23, 2017 · 1 min read

What’s the configured batchInterval and blockInterval? If there is a single partition of data, async might not be any better.

You can also verify if the stages are actually being executed in parallel — check the spark UI, see if the stages of a job (where a job corresponds to a micro-batch) have the same start time.

If the stages have the same start time, but still complete in a staggered way — their tasks might all be getting queued on the same executor. Make sure you set the data locality interval to 0 seconds.

Did you also try out the concurrent jobs setting? That’s the biggest weapon in the arsenal to reduce queueing.

    Venkatesh Iyer

    Written by

    Infra Tinkerer at Thumbtack