PinnedNaveen SoroutinTowards Data EngineeringCalculating Date Differences and Months Between with PySpark in Databricks: A Comprehensive GuideThe date diff() function in Pyspark is popularly used to get the difference of dates and the number of days between the dates specified…Jun 2, 2023Jun 2, 2023
PinnedNaveen SoroutinTowards Data EngineeringMastering Timestamp to Date Conversion in PySpark: Unlocking Time-Based Insights with DatabricksThe to_date() function in Apache PySpark is popularly used to convert Timestamp to the date. This is mainly achieved by truncating the…May 31, 2023May 31, 2023
PinnedNaveen SoroutinTowards Data EngineeringEfficient Data Processing with PySpark’s Pivot and Stack Functions in DatabricksIn PySpark, the “pivot()” function is an important function that lets you rotate or transpose data from one column into multiple columns in…Mar 17, 2023Mar 17, 2023
Naveen SoroutMerging with Precision: Data Deduplication in Databricks’ Delta TablesThe Delta Lake table, defined as the Delta table, is both a batch table and the streaming source and sink. The Streaming data ingest, batch…Aug 9, 2023Aug 9, 2023
Naveen SoroutUnderstanding Change Data Capture (CDC) in Databricks’ Delta TablesThe Delta Lake table, defined as the Delta table, is both a batch table and the streaming source and sink. The Streaming data ingest, batch…Aug 9, 2023Aug 9, 2023
Naveen SoroutPySpark and Contingency Tables: A Practical How-To GuideData merging and data aggregation are essential parts of big data platforms’ day-to-day activities in most big data scenarios. In this…Aug 8, 2023Aug 8, 2023
Naveen SoroutHarnessing the Power of Spark in Airflow: The SparkSubmitOperator ExplainedIn big data scenarios, we schedule and run your complex data pipelines. To ensure that each task of your data pipeline will get executed in…Aug 6, 2023Aug 6, 2023
Naveen SoroutNext-Level Data Management: Saving DataFrames to MongoDB using PySparkIn most big data scenarios, a DataFrame in Apache Spark can be created in multiple ways: It can be created using different data formats…Jun 3, 2023Jun 3, 2023
Naveen SoroutFrom Text to Time: Mastering String to Timestamp Conversion in PySpark and DatabricksThe to_timestamp() function in Pyspark is popularly used to convert String to the Timestamp(i.e., Timestamp Type). The default format of…Jun 1, 20231Jun 1, 20231
Naveen SoroutDate to String Conversion in PySpark: Unleashing the Power of Data Transformation in DatabricksThe date_format() function in Apache Pyspark is popularly used to convert the DataFrame column from the Date to the String format. The…May 30, 2023May 30, 2023