Steps to help troubleshoot common performance issues in Spark/Pyspark jobs taking EMR/Databricks as example. Of-coarse all these after reviewing there is no change in the data trend or volume.
TL/DR
I have seen sometimes even more that 25x speed when operations are using parallize. This does depend on the other workloads on the cluster. Still the difference is significant