Data Engineering: Bulk Up Your Data Eng Skills using Airflow, Spark & DASK

Wix Engineering
Wix Engineering
Published in
2 min readAug 3, 2022
Photo by Hunter Harritt on Unsplash

In this Wix’s Data Engineering meetup we covered topics around data science and data engineering, plus worked on improving our knowledge of tools and BigData Engines, such as Airflow, Spark, and Dask.

This meetup was focused on helping you improve your dev velocity — we shared must-know practices, pitfalls, optimizations, tuning, plus introduced the tools that are gaining momentum in the Data Science world.

Airflow — Improve DAG authoring skills: Tips & Tricks — Elad Kalif (Hebrew)

A broken DAG surprised you? How about a non-templated Jinja format? Watch Elad Kalif to learn about this and other crucial Airflow practices and usages. You will learn about the features and the must-know practices, plus the common pitfalls of working with Apache-Airflow. Watch:

Apache Spark Optimization Techniques and Tuning — Almog Gelber (Hebrew)

This session by Almog Gelber will cover the common bottlenecks and pain points when building a spark pipeline, ways to fix them and make the application more efficient.

Not Only Spark! Introducing Dask — A Pythonic Big Data Framework for Data Science — Itamar Faran (Hebrew)

While Apache Spark is the state-of-the-art technology for huge out-of-memory data, its infrastructure-overhead may sometimes be “not worth it” for data science projects. In this talk we will be introducing Dask — a lightweight and pure-pythonic framework for out-of-memory dataframes built on numpy and pandas that integrates within the python data science ecosystem.

For more engineering updates and insights:

--

--

Wix Engineering
Wix Engineering

Wix Engineering. We develop innovative cloud-based web applications that influence our Wix.com 150M+ users worldwide