Deepa Vasanthkumar – Medium

Deepa Vasanthkumar

Pinned

Deepa Vasanthkumar

Unlocking the World of Data Engineering — Guide to Acing Interviews

In today’s data-driven world, the demand for skilled data engineers is soaring. Companies are on the lookout for professionals who can…

Jun 8

Unlocking the World of Data Engineering — Guide to Acing Interviews

Jun 8

Deepa Vasanthkumar

Apache Parquet The power of Columnar Storage

Apache Parquet is a columnar storage file format designed for efficient data processing in big data environments. It offers numerous…

Sep 16

Apache Parquet The power of Columnar Storage

Sep 16

Deepa Vasanthkumar

Apache Spark Scenario Problem to find the optimal resource allocation

ᴀꜱꜱᴜᴍᴇ ʏᴏᴜ ʜᴀᴠᴇ ᴀ ᴅᴀᴛᴀꜱᴇᴛ ᴏꜰ 500 ɢʙ ᴛʜᴀᴛ ɴᴇᴇᴅꜱ ᴛᴏ ʙᴇ ᴘʀᴏᴄᴇꜱꜱᴇᴅ ᴏɴ ᴀ ꜱᴘᴀʀᴋ ᴄʟᴜꜱᴛᴇʀ. ᴛʜᴇ ᴄʟᴜꜱᴛᴇʀ ʜᴀꜱ 10 ɴᴏᴅᴇꜱ, ᴇᴀᴄʜ ᴡɪᴛʜ 64 ɢʙ ᴏꜰ ᴍᴇᴍᴏʀʏ ᴀɴᴅ…

Sep 3

Apache Spark Scenario Problem to find the optimal resource allocation

Sep 3

Deepa Vasanthkumar

Difference between count(*) and count(Columname)

COUNT(*) will count the number of rows, while COUNT(expression) will count non-null values in expression and COUNT(column) will count all…

Aug 27

Difference between count(*) and count(Columname)

Aug 27

Deepa Vasanthkumar

Spark job Submission workflow

When you submit a job in Apache Spark, the following sequence of events takes place:

Aug 20

Spark job Submission workflow

Aug 20

Deepa Vasanthkumar

Spark Optimizations while Joining Datasets

Understanding Apache Spark Join Strategies

Jul 31

Spark Optimizations while Joining Datasets

Jul 31

Deepa Vasanthkumar

Understanding Coalesce function in SQL and Spark

The COALESCE function is a powerful and commonly used feature in both SQL and Apache Spark. It is instrumental in handling NULL values and…

Jul 17

Understanding Coalesce function in SQL and Spark

Jul 17

Deepa Vasanthkumar

Spark Concepts and Questions

1. How many types of join strategies are there in Spark?

Jul 12

Spark Concepts and Questions

Jul 12

Deepa Vasanthkumar

Exploring Architectural Patterns in Data Engineering Projects

Data engineering is a critical component of any data-driven organization, enabling the collection, transformation, and management of data…

Jul 1

Exploring Architectural Patterns in Data Engineering Projects

Jul 1

Deepa Vasanthkumar

Code Optimization in PySpark Leveraging Best Practices

Apache Spark is a powerful framework for distributed data processing, but to fully leverage its capabilities, it’s essential to write…

Jun 26

Code Optimization in PySpark Leveraging Best Practices

Jun 26

Deepa Vasanthkumar

Deepa Vasanthkumar

Data Engineering & Cloud | Follow/Connect 👋 https://www.linkedin.com/in/deepa-vasanthkumar/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams