How does Adaptive Query Execution fix your Spark performance issues?

Kerrache Massipssa
5 min readDec 25, 2023

In Apache Spark versions before 3.0, the common performance issues encountered are:

  • Data skewness, inadequate partitioning, causing uneven distribution.
  • Suboptimal query plan choices, where Spark might choose a static plan without considering runtime statistics, leading to inefficiencies.
  • The lack of adaptability in handling varying data sizes between stages poses another performance hurdle.

These issues are now fixed with the help of Adaptive Query Execution (AQE), which we’ll discuss in detail in this article.

What’s Adaptive Query Execution (AQE)?

Before Spark 3.0, a notable downside was that once the best-optimized plan was determined, no further optimization could be performed until the end of the Spark application. This limitation hindered the ability to adapt and improve execution dynamically during the application’s runtime. However, since Spark 3.0, it has become possible to perform runtime optimization with the help of AQE.

In the short term, AQE is an optimization technique in Spark SQL that utilizes runtime statistics to choose the most efficient query execution plan.

This feature is enabled by default starting from Apache Spark 3.2.0.

How Does AQE Perform Optimizations?

--

--

Kerrache Massipssa

Hi 👋, I’m a Data Architect. Learning, writing, and sharing is my motto. I love Data & Open-Source & Cloud. My Blog: https://dataopsblog.com