PrashantShukla – Medium

PrashantShukla

Pinned

PrashantShukla

Read csv file in pyspark databricks

Apr 4, 2023

Apr 4, 2023

Pinned

PrashantShukla

Read json file in pyspark

Apr 4, 2023

Apr 4, 2023

Pinned

PrashantShukla

What is Azure Data Factory Delete Activity?

Apr 4, 2023

Apr 4, 2023

PrashantShukla

PySpark read method common options

Common Options:

Aug 2, 2023

Aug 2, 2023

PrashantShukla

Apache Spark “vacuum”

In Apache Spark, a “vacuum” operation is used to remove old versions of files or data that are no longer needed. This operation is…

May 20, 2023

May 20, 2023

PrashantShukla

Partition Replication Handled By Spark

In Apache Spark, partition replication refers to the process of replicating data partitions across multiple nodes in a cluster. This…

May 20, 2023

May 20, 2023

PrashantShukla

From Where New rdd Creates When Node Gets Fail In Spark

In Apache Spark, when a node fails, the Resilient Distributed Dataset (RDD) is recovered using the lineage information. RDD lineage is a…

May 20, 2023

May 20, 2023

PrashantShukla

How rdd is resilient

RDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark that provides fault tolerance and resilience to…

May 18, 2023

May 18, 2023

PrashantShukla

Delta Lake generated columns

Delta Lake provides the ability to define generated columns, which are computed based on other columns in the same table, and are…

May 5, 2023

May 5, 2023

PrashantShukla

Relationship Between two Deltalake table

Delta Lake is a storage layer that runs on top of data stored in cloud-based object stores such as Amazon S3 or Azure Data Lake Storage…

May 4, 2023

May 4, 2023

PrashantShukla

PrashantShukla

Azure Data Engineer, Let's talk about data

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams