Homepage
Open in app
Sign in
Get started
Towards Data Engineering
Navigating the Path to Data Engineering Excellence
About
Follow
Trending
Inside a Netflix Data Engineering Interview.
Inside a Netflix Data Engineering Interview.
A Real time Media — OTT Domain Use Case Question and How to Solve It Together.
Brahma, The Data Engineer.
Sep 18
Parquet is Good for OLAP but Not for OLTP Use Cases. But Why?
Parquet is Good for OLAP but Not for OLTP Use Cases. But Why?
Many engineers and data scientists praise Parquet for its efficient compression and fast query performance. While it’s a highly valued…
Ritam Mukherjee
Sep 28
Apple Pay Data Engineering Interview Question.
Apple Pay Data Engineering Interview Question.
Let us Solve It Together and learn Data Engineering Better.
Brahma, The Data Engineer.
Sep 19
Latest
Crack Your PySpark Interview: Detailed Solutions to Real-Time Coding Challenges
Crack Your PySpark Interview: Detailed Solutions to Real-Time Coding Challenges
Explore Advanced Strategies for Session Tracking and Customer Review Classification in Spark-Based Applications
Pritam Deb
Oct 2
Data Quality Checklist: Don’t Let Bad Data Lead Your Decisions
Data Quality Checklist: Don’t Let Bad Data Lead Your Decisions
Bad Data, Bad Decisions — Simple Quality Checks Every Data Engineer and Analyst Should Implement
Satyam Sahu
Oct 1
Building Complex SQL Queries with Self-Joins: A Detailed Step-by-Step Guide
Building Complex SQL Queries with Self-Joins: A Detailed Step-by-Step Guide
Learn how to master SQL self-joins, using practical examples and techniques to build complex queries for advanced data analysis
Pritam Deb
Sep 30
Building Your First Pipeline Without Breaking the Bank
Building Your First Pipeline Without Breaking the Bank
How to Create a Scalable ETL Pipeline Using Free or Low-Cost Tools
Satyam Sahu
Sep 29
Data Engineer PySpark Coding Interview Questions (Part —II)
Data Engineer PySpark Coding Interview Questions (Part —II)
Practice These Problems before it’s too late
Kamireddy Mahendra
Sep 29
Scaling Apache Spark: Understanding Cluster Utilisation with a 50-Node Setup
Scaling Apache Spark: Understanding Cluster Utilisation with a 50-Node Setup
In this article, we will explore how resource management impacts performance in Apache Spark. We will use a 50-node Spark cluster setup to…
Ritam Mukherjee
Sep 22
A Beginner’s Guide to Apache Airflow with GCP Composer.
A Beginner’s Guide to Apache Airflow with GCP Composer.
An effective approach to orchestrating data workflows.
Adediwura Boluro-Ajayi
Sep 27
Live Healthier, Pay Less: How Python can Help You to Achieve It
Live Healthier, Pay Less: How Python can Help You to Achieve It
Imagine the following scenario:
Robin von Malottki
Sep 23
A Python Library every Data Engineer should know
A Python Library every Data Engineer should know
As a data engineer in a large company, ensuring data quality is a key responsibility. Even if you perform your tasks diligently and rarely…
Robin von Malottki
Sep 25
DataOps Explained: Why Every Data Engineer Needs to Know About It
DataOps Explained: Why Every Data Engineer Needs to Know About It
Discover why DataOps is transforming data engineering.
Rui Carvalho
Sep 25
About Towards Data Engineering
Latest Stories
Archive
About Medium
Terms
Privacy
Teams