PinnedDeepa VasanthkumarUnlocking the World of Data Engineering — Guide to Acing InterviewsIn today’s data-driven world, the demand for skilled data engineers is soaring. Companies are on the lookout for professionals who can…Jun 8Jun 8
Deepa VasanthkumarCode Optimization in PySpark Leveraging Best PracticesApache Spark is a powerful framework for distributed data processing, but to fully leverage its capabilities, it’s essential to write…7h ago7h ago
Deepa VasanthkumarSpark Accumulators and Broadcast variablesIn Apache Spark, both accumulators and broadcast variables are used to share data among nodes in a distributed processing environment, but…Jun 12Jun 12
Deepa VasanthkumarSpark dataframes select vs withcolumn comparisonIn Apache Spark, both `select` and `withColumn` are methods used to manipulate DataFrames, but they serve different purposes and have…Jun 10Jun 10
Deepa Vasanthkumarpyspark dataframe transform mThe `transform()` method in PySpark DataFrame API applies a user-defined function (UDF) to each row of the DataFrame. It takes a function…Apr 16Apr 16
Deepa VasanthkumarSpark Logical and Physical Plan GenerationIn Spark, when you submit a SQL query or DataFrame transformation, it goes through several stages of processing before execution. Let us…Apr 9Apr 9
Deepa VasanthkumarInterview Questions on Azure Data Factory and DatabricksAzure Data FactoryApr 41Apr 41
Deepa VasanthkumarUsing DuckDB JupySQL and Pandas in a notebookDuckDB is an open-source, lightweight, and embeddable analytical database management system (DBMS) designed for efficient querying and…Apr 23Apr 23