Deepa Vasanthkumarpyspark dataframe transform mThe `transform()` method in PySpark DataFrame API applies a user-defined function (UDF) to each row of the DataFrame. It takes a function…2 min read·Apr 16, 2024----
Deepa VasanthkumarSpark Logical and Physical Plan GenerationIn Spark, when you submit a SQL query or DataFrame transformation, it goes through several stages of processing before execution. Let us…6 min read·Apr 9, 2024----
Deepa VasanthkumarInterview Questions on Azure Data Factory and DatabricksAzure Data Factory13 min read·Apr 4, 2024--1--1
Deepa VasanthkumarUsing DuckDB JupySQL and Pandas in a notebookDuckDB is an open-source, lightweight, and embeddable analytical database management system (DBMS) designed for efficient querying and…3 min read·Apr 2, 2024--1--1
Deepa VasanthkumarHorizontal and Vertical Scalability in CloudImagine you have a task to carry heavy bags of groceries from the store to your home. Instead of making yourself stronger (vertical…3 min read·Feb 22, 2024----
Deepa VasanthkumarPySpark Testing UtilityMost common dilemma for any data engineer is to how to test or validate the code. When we are dealing with huge data, the correctness of…3 min read·Feb 21, 2024----
Deepa VasanthkumarUDTF in PySparkPython user-defined table function (UDTF)1 min read·Jan 15, 2024----
Deepa VasanthkumarOrder of SQL ExecutionFROM Clause: The query begins by identifying the tables involved in the JOIN operation specified in the FROM clause. It determines the…2 min read·Oct 3, 2023----
Deepa VasanthkumarStreaming Applications in DataEngineeringStreaming applications are crucial for various use cases, including real-time analytics, monitoring, fraud detection, and more. Here are…·8 min read·Sep 13, 2023----
Deepa VasanthkumarSpark SQL with Parameterized StatementsWith Spark 3.4 onwards, we can directly query from a pyspark dataframe.1 min read·Sep 12, 2023----