Published inTowards Data ScienceSeamless Parsing of Nested JSON and Schema Evolution in DLT Without Restarting PipelinesBased on a customer case study, an advanced tutorial on using Delta Live Tables to process JSON schema evolution without the need to…Oct 51Oct 51
Published inLevel Up CodingEfficiently Performing Data Deduplication in Streaming Workloads using Delta Live TablesSubstantiating that how simple it is to implement in Databricks Delta Live Tables. Also highlighting a few common pitfalls.Jan 251Jan 251
Published inLevel Up CodingOptimizing Merge Performance in Databricks — A Case StudyExplore Databricks features (e.g. DFP, Deletion Vectors) and data engineering principles to optimize performance of merge and joins in…Jan 8Jan 8
Published inLevel Up CodingMayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just FailWhile automating it via Python, boto3 and S3 SelectJan 26, 2023Jan 26, 2023
Published inTowards Data ScienceTest Driving Delta Lake 2.0 on AWS EMR — 7 Key LearningsWhat I learned after using Delta Lake 2.0 on AWS EMR along with installation steps and performance benchmarksOct 12, 2022Oct 12, 2022
Published inTowards Data ScienceGetting started with Delta Lake & Spark in AWS— The Easy Way!A step-by-step tutorial to configure Apache Spark and Delta Lake on EC2 in AWS along with code examples in PythonAug 31, 2022Aug 31, 2022
Published inLevel Up CodingVersioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelinesBy using AWS CodePipeline, CloudFormation, Lambda and Python.Mar 7, 20221Mar 7, 20221
Published inLevel Up CodingEfficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using PythonA guide to optimize your AWS S3 ingestion processes via in-memory processing and compression of CSV files using Python and AWS SDKJan 24, 20221Jan 24, 20221
Published inLevel Up CodingWatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application LogsHow to use WatchTower module in Python to integrate AWS Cloudwatch with Python’s logging module as an alternative of boto3Mar 2, 2021Mar 2, 2021
Published inTowards Data ScienceMachine Learning (kmeans clustering) in SparkML vs AWS SageMaker — My Two CentsMy experience of performing kmeans, an unsupervised machine learning algorithm, in SparkML and AWS SageMaker and their caveatsNov 9, 2019Nov 9, 2019