Homepage
Open in app
Sign in
Get started
Making Sense of Data & Helping Others Grow: Tips, Advice, and Stories from the Front Lines of Data Engineering
MOST READ
WRITE FOR US
FEATURED ARTICLE
LinkedIn
Follow
I Tested Masthead for Data Management
I Tested Masthead for Data Management
Here’s What Stood Out.
Tim Webster
Mar 12
Prioritizing Data Projects for Maximum Impact
Prioritizing Data Projects for Maximum Impact
Strategies to Align Analytics with Goals, Stakeholders, and Organizational Readiness
Clay Gambetti
Mar 10
Actions vs Transformations in Spark: Spark Series
Actions vs Transformations in Spark: Spark Series
Learn How to Optimize Large-Scale Data Pipelines by Mastering When and How Spark Executes Your Code
Lorena Gongang
Mar 10
The Data Engineering Lessons You Learn the Hard Way
The Data Engineering Lessons You Learn the Hard Way
12 Brutal Truths About Data Engineering They Don’t Teach You
Tim Webster
Feb 27
My Go-To Data Engineering Blogs, Publications, and Writers (and Why)
My Go-To Data Engineering Blogs, Publications, and Writers (and Why)
Curated Data Engineering Resources: My Picks Explained
Danilo Pinto
Feb 27
SQL vs NoSQL Databases
SQL vs NoSQL Databases
Everything you need to know about the two
Waqas Arshad Qadri
Feb 26
Apache Spark in Few Words: Spark Series
Apache Spark in Few Words: Spark Series
What you need to understand about Spark in fewer words.
Lorena Gongang
Feb 25
Troubleshooting Heavy dbt docs generate Command
Troubleshooting Heavy dbt docs generate Command
No perfect remedy, but some relief
Fumiaki Kobayashi
Feb 25
Building a Super Data Engineering Team
Building a Super Data Engineering Team
How Attitude, Learning Agility, and Diversity Transform a Team into a High-Performing Unit.
Clay Gambetti
Feb 19
Why Is Data Quality Still a Mess in 2025?
Why Is Data Quality Still a Mess in 2025?
If You Create Data, You Own Its Mess.
Tim Webster
Feb 9
How to Configure the GlueJobOperator in Apache Airflow
How to Configure the GlueJobOperator in Apache Airflow
Data engineering often requires setting up workflows that seamlessly connect multiple tools. One common challenge is integrating Apache…
Aline Rodrigues
Feb 9
Study Notes — PySpark Joins
Study Notes — PySpark Joins
Handling Different Types of Joins in PySpark
Santosh Joshi
Feb 9
Getting Started with Apache Iceberg: The Next Big Thing in Data Lakehouses
Getting Started with Apache Iceberg: The Next Big Thing in Data Lak...
What I Learned from Chapter 1 of Apache Iceberg: The Definitive Guide
Rui Carvalho
Feb 3
A Deep Dive into flatten vs explode
A Deep Dive into flatten vs explode
A short article on flatten, explode, explode outer in PySpark
Santosh Joshi
Jan 30
distinct() vs dropDuplicates() in PySpark
distinct() vs dropDuplicates() in PySpark
A Deep Dive into distinct(), dropDuplicates() and drop_duplicates()
Santosh Joshi
Jan 29
Unraveling Facebook’s Dataswarm: A Blueprint for Efficient Data Pipelines
Unraveling Facebook’s Dataswarm: A Blueprint for Efficient Data Pip...
Recreate the magic of Dataswarm with freely available tools and best practices.
Clay Gambetti
Jan 27
Why Data Engineering Is Never ‘Set and Forget’
Why Data Engineering Is Never ‘Set and Forget’
The Job That’s Never Done
Tim Webster
Jan 24
My 8 ‘Common Sense’ Rules for Writing Better SQL
My 8 ‘Common Sense’ Rules for Writing Better SQL
And One Annoying SQL Pet Peeve
Tim Webster
Jan 14
Commercial Data Ingestion Tools vs. Custom Data Pipelines: Which One is Right for Your Business?
Commercial Data Ingestion Tools vs. Custom Data Pipelines: Which On...
Which Option Best Fits Your Data Strategy?
Lorena Gongang
Jan 13
How to Write a DataFrame to a Single CSV?
How to Write a DataFrame to a Single CSV?
Write a Single CSV in PySpark: Using Coalesce
Santosh Joshi
Jan 13
Understanding Collect, Take, Limit, Show, Head and Display in PySpark
Understanding Collect, Take, Limit, Show, Head and Display in PySpark
A Quick and Crisp Guide to Inspecting DataFrames Efficiently in PySpark
Santosh Joshi
Jan 13
Why Data Engineering is Booming Post-ChatGPT Revolution
Why Data Engineering is Booming Post-ChatGPT Revolution
The rise of data engineering and its vital role in AI innovation.
CyCoderX
Jan 7
Overcoming Common Spark Performance Hurdles
Overcoming Common Spark Performance Hurdles
Tips for Optimizing Apache Spark Applications
Clay Gambetti
Jan 4
Small Steps to Big Wins in Data Engineering (My 2025 Plan)
Small Steps to Big Wins in Data Engineering (My 2025 Plan)
How I’m Building Momentum in 2025
Tim Webster
Jan 3
Someone Asked Me the Most Important Thing in a Data Warehouse. Here’s My Answer
Someone Asked Me the Most Important Thing in a Data Warehouse. Here...
Discover what’s the backbone of a successful data warehouse.
Rui Carvalho
Jan 2
About Art of Data Engineering
Latest Stories
Archive
About Medium
Terms
Privacy
Teams