PrashantShuklaApache Spark “vacuum”In Apache Spark, a “vacuum” operation is used to remove old versions of files or data that are no longer needed. This operation is…1 min read·May 20, 2023----
PrashantShuklaPartition Replication Handled By SparkIn Apache Spark, partition replication refers to the process of replicating data partitions across multiple nodes in a cluster. This…3 min read·May 20, 2023----
PrashantShuklaFrom Where New rdd Creates When Node Gets Fail In SparkIn Apache Spark, when a node fails, the Resilient Distributed Dataset (RDD) is recovered using the lineage information. RDD lineage is a…2 min read·May 20, 2023----
PrashantShuklaHow rdd is resilientRDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark that provides fault tolerance and resilience to…2 min read·May 18, 2023----
PrashantShuklaDelta Lake generated columnsDelta Lake provides the ability to define generated columns, which are computed based on other columns in the same table, and are…1 min read·May 5, 2023----
PrashantShuklaRelationship Between two Deltalake tableDelta Lake is a storage layer that runs on top of data stored in cloud-based object stores such as Amazon S3 or Azure Data Lake Storage…2 min read·May 4, 2023----