Convert your PySpark code to Snowpark code using Snowpark Migration Accelerator!

Are you a data practitioner working with Spark?

  • How fun is it to calculate the executor memory, driver memory, number of executors, #parallelism for every spark job you run?
  • Do you enjoy running into Out of Memory errors?
  • How thrilling is it to debug your failed spark job?
Image by Author

Challenges working with Spark

Debugging the multi-page stack trace from a failed job, troubleshooting why a job failed are hard. Especially because Spark jobs are memory-resident, a job failure makes the evidence disappear.

As we discussed above, the number of configurations we need to understand and set for a job to run optimally is a never ending challenge. Overall, capacity management & resource sizing are difficult.

Managing the infrastructure and keeping up with Spark version upgrades and dependency management are hard. This takes the focus away from business problems to the underlying infrastructure challenges for data engineers.

While some of us enjoy tinkering with and engineering a spark job, it gives nightmares for most of us.

Enter Snowpark!

Snowpark is the savior we didn’t know we needed. Snowpark is a set of libraries and runtimes that allows us to run Python, Java and Scala code within Snowflake.

With Snowpark, we don’t have to deal with 100s of configs and infrastructure setup. No capacity planning or resource sizing needed.

And the best part? We write Python code and work with DataFrames. Simplicity and ease of use at its best.

Check how to start using Snowpark, if you are a Spark user:

What is Snowpark Migration Accelerator?

If you are further along the journey and are looking to migrate to Snowpark, you should check out Snowpark Migration Accelerator.

It is a software that understands your source code (Python) by parsing and building a semantic model of your code’s behavior.

It is not a find-and-replace or regex matching tool.

For Spark, Snowpark Migration Accelerator identifies the usages of the Spark API, inventories them, and finally converts them to their functional equivalent in Snowpark.

IQVIA Spark to Snowpark Migration Case Study

Check out this detailed video to learn how IQVIA migrated from Spark to Snowpark, how to use Snowpark Migration Accelerator and more.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

--

--

Vino Duraisamy
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Developer Advocate @Snowflake❄️. Previously Data & Applied Machine Learning Engineer @Apple, Nike, NetApp | Spark, Snowflake, Hive, Python, SQL, AWS, Airflow