Homepage
Open in app
Sign in
Get started
Making Sense of Data & Helping Others Grow: Tips, Advice, and Stories from the Front Lines of Data Engineering
MOST READ
WRITE FOR US
FEATURED ARTICLE
LinkedIn
Follow
Apache Iceberg Hidden Partitioning: A Smarter Way to Organize Your Data Lake
Apache Iceberg Hidden Partitioning: A Smarter Way to Organize Your Data Lake
Tired of complex data partitioning? Learn how Apache Iceberg’s hidden partitions make your life easier.
Rui Carvalho
Apr 17
Optimizing the Performance of Iceberg Tables: A Deep Dive into Compaction
Optimizing the Performance of Iceberg Tables: A Deep Dive into Comp...
What I Learned from Chapter 4 of Apache Iceberg: The Definitive Guide about Compaction
Rui Carvalho
Apr 11
‘Have No Master’ — A Goal For Aspiring Data Engineers
‘Have No Master’ — A Goal For Aspiring Data Engineers
“I have no master and will say whatever the fuck I want”
MikeDoesEverything
Apr 11
How to Not Get Burned by Security in Data Engineering
How to Not Get Burned by Security in Data Engineering
Don’t Let Security Bite You — My Advice for Data Engineers
Tim Webster
Apr 2
Spark: PartitionBy vs ClusterBy Deep Dive – What’s the Difference?
Spark: PartitionBy vs ClusterBy Deep Dive – What’s the Difference?
Understand how you can leverage both in your Spark jobs.
Rui Carvalho
Apr 1
Mastering Behavior-Driven Development (BDD) in .NET: A Practical Guide
Mastering Behavior-Driven Development (BDD) in .NET: A Practical Guide
Writing tests is essential for maintaining robust and reliable software, but traditional unit testing often lacks readability and clear…
Hossein Kohzadi
Mar 24
I Tested Masthead for Data Management
I Tested Masthead for Data Management
Here’s What Stood Out.
Tim Webster
Mar 12
Prioritizing Data Projects for Maximum Impact
Prioritizing Data Projects for Maximum Impact
Strategies to Align Analytics with Goals, Stakeholders, and Organizational Readiness
Clay Gambetti
Mar 10
Actions vs Transformations in Spark: Spark Series
Actions vs Transformations in Spark: Spark Series
Learn How to Optimize Large-Scale Data Pipelines by Mastering When and How Spark Executes Your Code
Lorena Gongang
Mar 10
The Data Engineering Lessons You Learn the Hard Way
The Data Engineering Lessons You Learn the Hard Way
12 Brutal Truths About Data Engineering They Don’t Teach You
Tim Webster
Feb 27
My Go-To Data Engineering Blogs, Publications, and Writers (and Why)
My Go-To Data Engineering Blogs, Publications, and Writers (and Why)
Curated Data Engineering Resources: My Picks Explained
Danilo Pinto
Feb 27
SQL vs NoSQL Databases
SQL vs NoSQL Databases
Everything you need to know about the two
Waqas Arshad Qadri
Feb 26
Apache Spark in Few Words: Spark Series
Apache Spark in Few Words: Spark Series
What you need to understand about Spark in fewer words.
Lorena Gongang
Feb 25
Troubleshooting Heavy dbt docs generate Command
Troubleshooting Heavy dbt docs generate Command
No perfect remedy, but some relief
Fumiaki Kobayashi
Feb 25
Building a Super Data Engineering Team
Building a Super Data Engineering Team
How Attitude, Learning Agility, and Diversity Transform a Team into a High-Performing Unit.
Clay Gambetti
Feb 19
Why Is Data Quality Still a Mess in 2025?
Why Is Data Quality Still a Mess in 2025?
If You Create Data, You Own Its Mess.
Tim Webster
Feb 9
How to Configure the GlueJobOperator in Apache Airflow
How to Configure the GlueJobOperator in Apache Airflow
Data engineering often requires setting up workflows that seamlessly connect multiple tools. One common challenge is integrating Apache…
Aline Rodrigues
Feb 9
Study Notes — PySpark Joins
Study Notes — PySpark Joins
Handling Different Types of Joins in PySpark
Santosh Joshi
Feb 9
Getting Started with Apache Iceberg: The Next Big Thing in Data Lakehouses
Getting Started with Apache Iceberg: The Next Big Thing in Data Lak...
What I Learned from Chapter 1 of Apache Iceberg: The Definitive Guide
Rui Carvalho
Feb 3
How Spark Performs Joins: A Quick Look into Small and Large Table Joins
How Spark Performs Joins: A Quick Look into Small and Large Table J...
Optimizing join operations in Spark for different table sizes and distributions.
Santosh Joshi
Feb 1
A Deep Dive into flatten vs explode
A Deep Dive into flatten vs explode
A short article on flatten, explode, explode outer in PySpark
Santosh Joshi
Jan 30
distinct() vs dropDuplicates() in PySpark
distinct() vs dropDuplicates() in PySpark
A Deep Dive into distinct(), dropDuplicates() and drop_duplicates()
Santosh Joshi
Jan 29
Unraveling Facebook’s Dataswarm: A Blueprint for Efficient Data Pipelines
Unraveling Facebook’s Dataswarm: A Blueprint for Efficient Data Pip...
Recreate the magic of Dataswarm with freely available tools and best practices.
Clay Gambetti
Jan 27
Why Data Engineering Is Never ‘Set and Forget’
Why Data Engineering Is Never ‘Set and Forget’
The Job That’s Never Done
Tim Webster
Jan 24
My 8 ‘Common Sense’ Rules for Writing Better SQL
My 8 ‘Common Sense’ Rules for Writing Better SQL
And One Annoying SQL Pet Peeve
Tim Webster
Jan 14
About Art of Data Engineering
Latest Stories
Archive
About Medium
Terms
Privacy
Teams