Irfan Elahi – Medium

Irfan Elahi

Featured Book

Scala Programming for Big Data Analytics

Scala Programming for Big Data Analytics

An extensive guide on Scala to get started with Big Data Analytics using Apache Spark

2019

Stories

Irfan Elahi
in
Level Up Coding

Efficiently Performing Data Deduplication in Streaming Workloads using Delta Live Tables

Substantiating that how simple it is to implement in Databricks Delta Live Tables. Also highlighting a few common pitfalls.

Jan 25

Efficiently Performing Data Deduplication in Streaming Workloads using Delta Live Tables

Jan 25

Irfan Elahi
in
Level Up Coding

Optimizing Merge Performance in Databricks — A Case Study

Explore Databricks features (e.g. DFP, Deletion Vectors) and data engineering principles to optimize performance of merge and joins in…

Jan 8

Optimizing Merge Performance in Databricks — A Case Study

Jan 8

Irfan Elahi
in
Level Up Coding

Mayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just Fail

While automating it via Python, boto3 and S3 Select

Jan 26, 2023

Mayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just Fail

Jan 26, 2023

Irfan Elahi
in
Towards Data Science

Test Driving Delta Lake 2.0 on AWS EMR — 7 Key Learnings

What I learned after using Delta Lake 2.0 on AWS EMR along with installation steps and performance benchmarks

Oct 12, 2022

Test Driving Delta Lake 2.0 on AWS EMR — 7 Key Learnings

Oct 12, 2022

Irfan Elahi
in
Towards Data Science

Getting started with Delta Lake & Spark in AWS— The Easy Way!

A step-by-step tutorial to configure Apache Spark and Delta Lake on EC2 in AWS along with code examples in Python

Aug 31, 2022

Getting started with Delta Lake & Spark in AWS— The Easy Way!

Aug 31, 2022

Irfan Elahi
in
Level Up Coding

Versioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelines

By using AWS CodePipeline, CloudFormation, Lambda and Python.

Mar 7, 2022

Versioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelines

Mar 7, 2022

Irfan Elahi
in
Level Up Coding

Efficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using Python

A guide to optimize your AWS S3 ingestion processes via in-memory processing and compression of CSV files using Python and AWS SDK

Jan 24, 2022

Efficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using Python

Jan 24, 2022

Irfan Elahi
in
Level Up Coding

WatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application Logs

How to use WatchTower module in Python to integrate AWS Cloudwatch with Python’s logging module as an alternative of boto3

Mar 2, 2021

WatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application Logs

Mar 2, 2021

Irfan Elahi
in
Towards Data Science

Machine Learning (kmeans clustering) in SparkML vs AWS SageMaker — My Two Cents

My experience of performing kmeans, an unsupervised machine learning algorithm, in SparkML and AWS SageMaker and their caveats

Nov 9, 2019

Machine Learning (kmeans clustering) in SparkML vs AWS SageMaker — My Two Cents

Nov 9, 2019

Irfan Elahi
in
Towards Data Science

AWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t Ignore

A few gotchas about AWS EMR and AWS Glue that you, as a developer/architect, should know

Oct 28, 2019

AWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t Ignore

Oct 28, 2019

Irfan Elahi

Irfan Elahi

Book Author

Specialist Solutions Architect @ Databricks | Author | Photographer

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams