PinnedDeepa VasanthkumarUnlocking the World of Data Engineering — Guide to Acing InterviewsIn today’s data-driven world, the demand for skilled data engineers is soaring. Companies are on the lookout for professionals who can…3 min read·Jun 8, 2024----
Deepa VasanthkumarSpark Accumulators and Broadcast variablesIn Apache Spark, both accumulators and broadcast variables are used to share data among nodes in a distributed processing environment, but…2 min read·3 days ago----
Deepa VasanthkumarSpark dataframes select vs withcolumn comparisonIn Apache Spark, both `select` and `withColumn` are methods used to manipulate DataFrames, but they serve different purposes and have…2 min read·5 days ago----
Deepa Vasanthkumarpyspark dataframe transform mThe `transform()` method in PySpark DataFrame API applies a user-defined function (UDF) to each row of the DataFrame. It takes a function…3 min read·Apr 16, 2024----
Deepa VasanthkumarSpark Logical and Physical Plan GenerationIn Spark, when you submit a SQL query or DataFrame transformation, it goes through several stages of processing before execution. Let us…6 min read·Apr 9, 2024----
Deepa VasanthkumarInterview Questions on Azure Data Factory and DatabricksAzure Data Factory13 min read·Apr 4, 2024--1--1
Deepa VasanthkumarUsing DuckDB JupySQL and Pandas in a notebookDuckDB is an open-source, lightweight, and embeddable analytical database management system (DBMS) designed for efficient querying and…3 min read·Apr 2, 2024--3--3
Deepa VasanthkumarHorizontal and Vertical Scalability in CloudImagine you have a task to carry heavy bags of groceries from the store to your home. Instead of making yourself stronger (vertical…3 min read·Feb 22, 2024----
Deepa VasanthkumarPySpark Testing UtilityMost common dilemma for any data engineer is to how to test or validate the code. When we are dealing with huge data, the correctness of…3 min read·Feb 21, 2024----