PinnedDeepa VasanthkumarUnlocking the World of Data Engineering — Guide to Acing InterviewsIn today’s data-driven world, the demand for skilled data engineers is soaring. Companies are on the lookout for professionals who can…Jun 8Jun 8
Deepa VasanthkumarSpark Accumulators and Broadcast variablesIn Apache Spark, both accumulators and broadcast variables are used to share data among nodes in a distributed processing environment, but…3d ago3d ago
Deepa VasanthkumarSpark dataframes select vs withcolumn comparisonIn Apache Spark, both `select` and `withColumn` are methods used to manipulate DataFrames, but they serve different purposes and have…5d ago5d ago
Deepa Vasanthkumarpyspark dataframe transform mThe `transform()` method in PySpark DataFrame API applies a user-defined function (UDF) to each row of the DataFrame. It takes a function…Apr 16Apr 16
Deepa VasanthkumarSpark Logical and Physical Plan GenerationIn Spark, when you submit a SQL query or DataFrame transformation, it goes through several stages of processing before execution. Let us…Apr 9Apr 9
Deepa VasanthkumarInterview Questions on Azure Data Factory and DatabricksAzure Data FactoryApr 41Apr 41
Deepa VasanthkumarUsing DuckDB JupySQL and Pandas in a notebookDuckDB is an open-source, lightweight, and embeddable analytical database management system (DBMS) designed for efficient querying and…Apr 23Apr 23
Deepa VasanthkumarHorizontal and Vertical Scalability in CloudImagine you have a task to carry heavy bags of groceries from the store to your home. Instead of making yourself stronger (vertical…Feb 22Feb 22
Deepa VasanthkumarPySpark Testing UtilityMost common dilemma for any data engineer is to how to test or validate the code. When we are dealing with huge data, the correctness of…Feb 21Feb 21