PrashantShuklaApache Spark “vacuum”In Apache Spark, a “vacuum” operation is used to remove old versions of files or data that are no longer needed. This operation is…May 20, 2023May 20, 2023
PrashantShuklaPartition Replication Handled By SparkIn Apache Spark, partition replication refers to the process of replicating data partitions across multiple nodes in a cluster. This…May 20, 2023May 20, 2023
PrashantShuklaFrom Where New rdd Creates When Node Gets Fail In SparkIn Apache Spark, when a node fails, the Resilient Distributed Dataset (RDD) is recovered using the lineage information. RDD lineage is a…May 20, 2023May 20, 2023
PrashantShuklaHow rdd is resilientRDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark that provides fault tolerance and resilience to…May 18, 2023May 18, 2023
PrashantShuklaDelta Lake generated columnsDelta Lake provides the ability to define generated columns, which are computed based on other columns in the same table, and are…May 5, 2023May 5, 2023
PrashantShuklaRelationship Between two Deltalake tableDelta Lake is a storage layer that runs on top of data stored in cloud-based object stores such as Amazon S3 or Azure Data Lake Storage…May 4, 2023May 4, 2023