The Startup
Published in

The Startup

Spark SQL: Adaptive Query Execution

Altering the physical execution plan at runtime.

Level of parallelism and selection of the right join strategy have shown to be the key factors when it comes to complex query performance in large clusters.

Even though Spark 2.x already implemented a few parameters to somehow tweak its related behaviour, having to manually tune them was not practical in many production scenarios. Besides, a static configuration may not be the right one for all stages of a job, as usually stages located closer to the final output…

--

--

--

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +756K followers.

Recommended from Medium

How to Automate Data Analytics Using CI/CD

Dynein: Building a Distributed Delayed Job Queueing System

Error Handling in AWS Lambda and API Gateway

10 reasons why you are not ready to adopt data mesh

PostgreSQL: Lessons Learned While Optimising Query Performance

How to Get Into Tech and Become a Senior Engineer in 4 Years

Using Azure Monitor Logs with Azure Kubernetes Service (AKS)

Adding CVE Scanning to a CI/CD Pipeline

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Enrique Rebollo García

Enrique Rebollo García

More from Medium

What Is Undo In Oracle?

Big Data Explained

YARN on Hadoop.

Decoding Spark Query — Physical Plan