Homepage
Open in app
Sign inGet started

Making Sense of Data & Helping Others Grow: Tips, Advice, and Stories from the Front Lines of Data Engineering

  • MOST READ
  • WRITE FOR US
  • FEATURED ARTICLE
  • LinkedIn
  • 🧊 Snowflake vs 🔥 Databricks — What I Found After Actually Using Both

    🧊 Snowflake vs 🔥 Databricks — What I Found After Actually Using Both

    I used both Snowflake and Databricks — here’s what worked and what didn’t.
    Go to the profile of Rooted in Data
    Rooted in Data
    Jun 5
    The Cleanest Way to Generate HTML from C#

    The Cleanest Way to Generate HTML from C#

    Say goodbye to HTML spaghetti and hello to testable, designer-friendly templates using Stubble.
    Go to the profile of Hossein Kohzadi
    Hossein Kohzadi
    Jun 3
    Agentic AI in Data Engineering: From Automation to Autonomous Intelligence

    Agentic AI in Data Engineering: From Automation to Autonomous Intel...

    Why AI Agents Are Set to Redefine the Future of Data Infrastructure
    Go to the profile of RAKESH CHANDA
    RAKESH CHANDA
    May 6
    Apache Iceberg Hidden Partitioning: A Smarter Way to Organize Your Data Lake

    Apache Iceberg Hidden Partitioning: A Smarter Way to Organize Your ...

    Tired of complex data partitioning? Learn how Apache Iceberg’s hidden partitions make your life easier.
    Go to the profile of Rui Carvalho
    Rui Carvalho
    Apr 17
    Optimizing the Performance of Iceberg Tables: A Deep Dive into Compaction

    Optimizing the Performance of Iceberg Tables: A Deep Dive into Comp...

    What I Learned from Chapter 4 of Apache Iceberg: The Definitive Guide about Compaction
    Go to the profile of Rui Carvalho
    Rui Carvalho
    Apr 11
    ‘Have No Master’ — A Goal For Aspiring Data Engineers

    ‘Have No Master’ — A Goal For Aspiring Data Engineers

    “I have no master and will say whatever the fuck I want”
    Go to the profile of MikeDoesEverything
    MikeDoesEverything
    Apr 11
    How to Not Get Burned by Security in Data Engineering

    How to Not Get Burned by Security in Data Engineering

    Don’t Let Security Bite You — My Advice for Data Engineers
    Go to the profile of Tim Webster
    Tim Webster
    Apr 2
    Spark: PartitionBy vs ClusterBy Deep Dive – What’s the Difference?

    Spark: PartitionBy vs ClusterBy Deep Dive – What’s the Difference?

    Understand how you can leverage both in your Spark jobs.
    Go to the profile of Rui Carvalho
    Rui Carvalho
    Apr 1
    Mastering Behavior-Driven Development (BDD) in .NET: A Practical Guide

    Mastering Behavior-Driven Development (BDD) in .NET: A Practical Guide

    Writing tests is essential for maintaining robust and reliable software, but traditional unit testing often lacks readability and clear…
    Go to the profile of Hossein Kohzadi
    Hossein Kohzadi
    Mar 24
    I Tested Masthead for Data Management

    I Tested Masthead for Data Management

    Here’s What Stood Out.
    Go to the profile of Tim Webster
    Tim Webster
    Mar 12
    Prioritizing Data Projects for Maximum Impact

    Prioritizing Data Projects for Maximum Impact

    Strategies to Align Analytics with Goals, Stakeholders, and Organizational Readiness
    Go to the profile of Clay Gambetti
    Clay Gambetti
    Mar 10
    Actions vs Transformations in Spark: Spark Series

    Actions vs Transformations in Spark: Spark Series

    Learn How to Optimize Large-Scale Data Pipelines by Mastering When and How Spark Executes Your Code
    Go to the profile of Lorena Gongang
    Lorena Gongang
    Mar 10
    The Data Engineering Lessons You Learn the Hard Way

    The Data Engineering Lessons You Learn the Hard Way

    12 Brutal Truths About Data Engineering They Don’t Teach You
    Go to the profile of Tim Webster
    Tim Webster
    Feb 27
    My Go-To Data Engineering Blogs, Publications, and Writers (and Why)

    My Go-To Data Engineering Blogs, Publications, and Writers (and Why)

    Curated Data Engineering Resources: My Picks Explained
    Go to the profile of Danilo Pinto
    Danilo Pinto
    Feb 27
    SQL vs NoSQL Databases

    SQL vs NoSQL Databases

    Everything you need to know about the two
    Go to the profile of Waqas Arshad Qadri
    Waqas Arshad Qadri
    Feb 26
    Apache Spark in Few Words: Spark Series

    Apache Spark in Few Words: Spark Series

    What you need to understand about Spark in fewer words.
    Go to the profile of Lorena Gongang
    Lorena Gongang
    Feb 25
    Troubleshooting Heavy dbt docs generate Command

    Troubleshooting Heavy dbt docs generate Command

    No perfect remedy, but some relief
    Go to the profile of Fumiaki Kobayashi
    Fumiaki Kobayashi
    Feb 25
    Broadcast Variables in PySpark

    Broadcast Variables in PySpark

    A Distributed Shared Variable in Spark
    Go to the profile of Santosh Joshi
    Santosh Joshi
    Feb 22
    Why Is Data Quality Still a Mess in 2025?

    Why Is Data Quality Still a Mess in 2025?

    If You Create Data, You Own Its Mess.
    Go to the profile of Tim Webster
    Tim Webster
    Feb 9
    How to Configure the GlueJobOperator in Apache Airflow

    How to Configure the GlueJobOperator in Apache Airflow

    Data engineering often requires setting up workflows that seamlessly connect multiple tools. One common challenge is integrating Apache…
    Go to the profile of Aline Rodrigues
    Aline Rodrigues
    Feb 9
    Study Notes — PySpark Joins

    Study Notes — PySpark Joins

    Handling Different Types of Joins in PySpark
    Go to the profile of Santosh Joshi
    Santosh Joshi
    Feb 9
    Getting Started with Apache Iceberg: The Next Big Thing in Data Lakehouses

    Getting Started with Apache Iceberg: The Next Big Thing in Data Lak...

    What I Learned from Chapter 1 of Apache Iceberg: The Definitive Guide
    Go to the profile of Rui Carvalho
    Rui Carvalho
    Feb 3
    How Spark Performs Joins: A Quick Look into Small and Large Table Joins

    How Spark Performs Joins: A Quick Look into Small and Large Table J...

    Optimizing join operations in Spark for different table sizes and distributions.
    Go to the profile of Santosh Joshi
    Santosh Joshi
    Feb 1
    A Deep Dive into flatten vs explode

    A Deep Dive into flatten vs explode

    A short article on flatten, explode, explode outer in PySpark
    Go to the profile of Santosh Joshi
    Santosh Joshi
    Jan 30
    distinct() vs dropDuplicates() in PySpark

    distinct() vs dropDuplicates() in PySpark

    A Deep Dive into distinct(), dropDuplicates() and drop_duplicates()
    Go to the profile of Santosh Joshi
    Santosh Joshi
    Jan 29
    About Art of Data EngineeringLatest StoriesArchiveAbout MediumTermsPrivacyTeams