Rishika IdnaniinArt of Data EngineeringLate arriving data : Challenges and Traditional SolutionsFeb 9Feb 9
Rishika IdnaniComplex types in Spark — Arrays, Maps & StructsIn Apache Spark, there are some complex data types that allows storage of multiple values in a single column in a data frame. This article…Dec 14, 2023Dec 14, 2023
Rishika IdnaniinTowards Data EngineeringImprove s3 write performance with magic committer in Spark3Comparing traditional & magic s3 committer and a guide to use magic committerMar 17, 20231Mar 17, 20231
Rishika IdnaniSolving data skewness in Spark with SaltingData skewness refers to the non-uniform distribution of data in a dataset. Skewed data causes certain nodes/workers in a Spark cluster to…Feb 23, 20231Feb 23, 20231
Rishika IdnaniRestoring objects from Glacier to S3 using Batch Operations (using Python boto3)S3 Batch operations gives us the leverage of performing actions across billions of objects and terabytes of data with a single request. In…Nov 30, 2021Nov 30, 2021
Rishika IdnaniRestoring objects from Glacier to S3 using Batch Operations (using AWS Console)S3 Batch operations gives us the leverage of performing actions across billions of objects and terabytes of data with a single request. In…Oct 29, 2021Oct 29, 2021