PinnedDeepa VasanthkumarUnlocking the World of Data Engineering — Guide to Acing InterviewsIn today’s data-driven world, the demand for skilled data engineers is soaring. Companies are on the lookout for professionals who can…Jun 8Jun 8
Deepa VasanthkumarApache Parquet The power of Columnar StorageApache Parquet is a columnar storage file format designed for efficient data processing in big data environments. It offers numerous…Sep 16Sep 16
Deepa VasanthkumarApache Spark Scenario Problem to find the optimal resource allocationᴀꜱꜱᴜᴍᴇ ʏᴏᴜ ʜᴀᴠᴇ ᴀ ᴅᴀᴛᴀꜱᴇᴛ ᴏꜰ 500 ɢʙ ᴛʜᴀᴛ ɴᴇᴇᴅꜱ ᴛᴏ ʙᴇ ᴘʀᴏᴄᴇꜱꜱᴇᴅ ᴏɴ ᴀ ꜱᴘᴀʀᴋ ᴄʟᴜꜱᴛᴇʀ. ᴛʜᴇ ᴄʟᴜꜱᴛᴇʀ ʜᴀꜱ 10 ɴᴏᴅᴇꜱ, ᴇᴀᴄʜ ᴡɪᴛʜ 64 ɢʙ ᴏꜰ ᴍᴇᴍᴏʀʏ ᴀɴᴅ…Sep 31Sep 31
Deepa VasanthkumarDifference between count(*) and count(Columname)COUNT(*) will count the number of rows, while COUNT(expression) will count non-null values in expression and COUNT(column) will count all…Aug 271Aug 271
Deepa VasanthkumarSpark job Submission workflowWhen you submit a job in Apache Spark, the following sequence of events takes place:Aug 20Aug 20
Deepa VasanthkumarSpark Optimizations while Joining DatasetsUnderstanding Apache Spark Join StrategiesJul 31Jul 31
Deepa VasanthkumarUnderstanding Coalesce function in SQL and SparkThe COALESCE function is a powerful and commonly used feature in both SQL and Apache Spark. It is instrumental in handling NULL values and…Jul 17Jul 17
Deepa VasanthkumarSpark Concepts and Questions1. How many types of join strategies are there in Spark?Jul 12Jul 12
Deepa VasanthkumarExploring Architectural Patterns in Data Engineering ProjectsData engineering is a critical component of any data-driven organization, enabling the collection, transformation, and management of data…Jul 1Jul 1
Deepa VasanthkumarCode Optimization in PySpark Leveraging Best PracticesApache Spark is a powerful framework for distributed data processing, but to fully leverage its capabilities, it’s essential to write…Jun 261Jun 261