Step by step roadmap to becoming a Snowflake Data Engineer!

Hint — SQL & Python is all you need. Yeah, no kidding!!

MAD landscape of 2023
  • Are you looking to break into data engineering and intimidated by the number of languages and tools in the space?
  • The madness you see above is the state of the data landscape in 2023. It is humanly impossible to learn all the tools and stack out in the market.
  • So where should you start?

YOU SHOULD START WITH A STRONG FOUNDATION —

SQL and Python. Why??

  • SQL has been around since 1975, and it is here to stay!
  • Python is the most beginner friendly language, and it supports a vast ecosystem of tools and packages for data processing.
  • That is all you need to skyrocket your data engineering career. 🚀🚀

But what about Data Infrastructure? Distributed processing frameworks?

A decade ago, the field of big data was nascent and evolving. Organizations built data platforms and maintained their own infrastructure. So the big data developers (as their job titles were called in those days) needed to master infrastructure and DevOps skills, in addition to programming, to succeed in their roles.

However, thanks to the explosion of Cloud and Software as a Service (SaaS) offerings such as Snowflake, most data engineers today do not maintain the underlying infrastructure.

There is a huge shift from infrastructure focus to analytics focus in data engineering teams & roles, as pointed out in the future of data engineer by Meta Analytics Blog.

Reference: https://medium.com/@AnalyticsAtMeta/the-future-of-the-data-engineer-part-i-32bd125465be

Although it’s not a bad idea to develop expertise in DevOps and Infra as you grow in your career, you must not get overwhelmed by it when starting your data engineering career.

SO THE FOCUS MUST BE ON THE FUNDAMENTALS —

SQL & Python.

Snowflake Data Engineering Roadmap — Absolute Beginner Level:

If you are at the beginner level, follow these steps in the same order.

  1. Learn how to query and transform data with SQL. Here is SQL course by freecodecamp.
  2. Get yourself familiar with data warehousing, data modeling, data lake concepts.
  3. Watch the Snowflake 101 videos.
  4. Sign up for the Snowflake Free Trial and use the sample dataset to practice writing SQL queries. Note: If you are a student, you get the free trial account valid for 90 days, instead of regular 30 days. Use it to your benefit.
  5. Follow along this hands-on tutorial — Zero to Snowflake in 90 mins.
  6. Master Python for data processing — Learn to work with a sample data set using Pandas, Numpy, Datetime packages.

Intermediate Level:

Once you have the foundational knowledge of how to write queries, and familiarized yourself with the Snowflake environment, here are the steps to level up.

  1. Sign up for the FREE Snowflake Essentials training, and follow along the hands-on tutorials.
  1. Next follow this Quickstart to familiarize yourself with the different ways to load data into the Snowflake environment — A tour of Ingest.
  2. Sign up for Virtual Hands-on Labs of your interest, and gain more experience.
  3. Build an end to end data engineering pipeline using Snowpark Python dataframes.

Advanced Level:

After you learn how to build a preliminary data pipeline, you should now focus on building enterprise grade pipelines. Think about what are the challenges you will face working with real time data, and deploying data pipelines in production.

  1. Ensuring data quality in your pipelines using Soda core & Snowflake
  2. How to orchestrate or schedule the data pipelines using different scheduling tools available?
  3. How to debug and troubleshoot failed tasks and data pipelines.
  4. How to optimize the performance of your queries and run cost-effective pipelines.

Snowflake Developer Resources:

During your journey, if you have any questions or run into any issues, reach out to the Snowflake community for help.

  • Find your local Snowflake data heroes community and sign up for their user group meetings. Peer learning from folks working in the industry is a huge opportunity.
  • Ask away your questions on understanding the error messages, best practices, debugging and troubleshooting in Snowflake forum.
  • Access the Snowflake reference architectures, quickstarts, videos and blogs to build your first pipeline or data application at Snowflake Developer Hub.

Bonus: Snowflake Certifications:

  • After gaining experience building with Snowflake, take up a Snowflake certification to reinforce your knowledge and validate your expertise.

My Snowflake Story:

Although I worked as a Data/ML engineer for years, I was not a Snowflake developer. When I joined Snowflake in July 2023, my team of engineers, product managers and developer advocates shared all the right resources — blogs, videos and hands-on-labs — to get me up to speed. Based on those curated resources, I have built a custom roadmap for you all. I hope this is helpful for you as much as it was for me.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

--

--

Vino Duraisamy
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Developer Advocate @Snowflake❄️. Previously Data & Applied Machine Learning Engineer @Apple, Nike, NetApp | Spark, Snowflake, Hive, Python, SQL, AWS, Airflow