Satyam Sahu – Medium

Satyam Sahu

he/him

Pinned

Satyam Sahu
in
Towards Data Engineering

Mastering PySpark RDDs: The Building Blocks of Distributed Data

Learn How RDDs Provide Fault Tolerance and Distributed Computing in Spark

13h ago

Blog Cover Image Showing Title of the Blog Part-4 of Learning Spark and PySpark Series | Title: Mastering PySpark RDD | RDD vs Dataframes | Transformations and Actions | Lineage | Partitioning

13h ago

Pinned

Satyam Sahu
in
Python in Plain English

Working with APIs: Building Data Pipelines Using Python Requests Library

Beginner’s Guide to Interacting with APIs and Building Efficient Data Pipelines Using Python and Requests Library

Sep 20

Working with APIs using Python-Python API data pipeline tutorial.

Sep 20

Pinned

Satyam Sahu
in
Towards Data Engineering

5 Common Mistakes That Are Killing Your Pipeline

Identify and Fix These Issues Before They Ruin Your Data Infrastructure

Sep 25

5 Common Mistakes That Are Killing Your Pipeline

Sep 25

Pinned

Satyam Sahu
in
Art of Data Engineering

Building Your First ETL Pipeline with Python and SQL

Step-by-Step ETL Pipeline Tutorial - Learn How to Extract, Transform, and Load Data Using Python and SQL for Beginners

Sep 16

Building Your First ETL Pipeline with Python and SQL

Sep 16

Pinned

Satyam Sahu
in
Art of Data Engineering

Handling Large Datasets in SQL

Techniques for Querying Millions of Rows Efficiently

Sep 14

a big data background image with blog title written over it.

Sep 14

Satyam Sahu
in
Towards Data Engineering

Introduction to PySpark: Your First Step Into Distributed Data Processing

Master PySpark’s Fundamentals and Kickstart Your Journey in Data Engineering

Oct 27

Blog Cover Image Showing Title Introduction to PySpark | Fundamentals | Setting up PySpark | PySpark SQL | PySpark MLlib | UDF | Streaming | Debugging and Monitoring | Tuning | Partitioning

Oct 27

Satyam Sahu
in
Nerd For Tech

How I Improved My Data Analysis Speed with Python’s Dask Library

Dask vs. Pandas: Learn When and How to Use Dask for Faster Data Processing

Oct 27

Dask vs Pandas blog cover image showing key diffrences between dask and python pandas and when to use which.

Oct 27

Satyam Sahu
in
Towards Data Engineering

Understanding Spark Architecture: How It All Comes Together

A Deep Dive into Spark’s Master-Slave Architecture, Cluster Management, and Execution Model

Oct 24

Blog Cover showing Apache Spark Architecture Explained in Detail in Simple Terms | Master-Slave Architecture | RDD | Dataframes | Tasks and Stages | Lazy Evaluation | DAG | Big Data Processing | Spark 101

Oct 24

Satyam Sahu
in
Python in Plain English

One Data Processing Tool You Should Know for Handling API Data

How Dask Transformed My Slow API Data Processing to Lightning-Fast!

Oct 24

Dask Blog cover image showing how dask is transforming big data processing tasks. | parallel processing | API Rate Limits | Batch processing | Task Scheduling | Lazy Evaluation

Oct 24

Satyam Sahu
in
Learning SQL

Why Your Data Analysis Is Wrong: Fix Common SQL Mistakes

The Simple SQL Errors That Are Ruining Your Analysis — And How to Correct Them

Oct 21

Why Your Data Analysis Is Wrong: Fix Common SQL Mistakes

Oct 21

Satyam Sahu

Satyam Sahu

he/him

I write about everything data —sharing tips, tricks, and insights. Join me in exploring and learning from the world of data!

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams