Open in app

Sign In

Write

Sign In

Irfan Elahi
Irfan Elahi

251 Followers

Home

About

Published in Level Up Coding

·Jan 26

Mayday to Eureka! Cataloguing Tables in AWS Glue when Crawlers Just Fail

While automating it via Python, boto3 and S3 Select — If you have built data lake on AWS, then one of the quickest ways to catalog data sitting on the storage layer (mostly S3) is via AWS Glue crawlers. …

Python

6 min read

Mayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just Fail
Mayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just Fail
Python

6 min read


Published in Towards Data Science

·Oct 12, 2022

Test Driving Delta Lake 2.0 on AWS EMR — 7 Key Learnings

What I learned after using Delta Lake 2.0 on AWS EMR along with installation steps and performance benchmarks — If you have read my previous article about getting started with Delta Lake in AWS, you would have got the fundamental context and rationale that why the offerings like Delta Lake are gaining traction and what type of use-cases they address. The article presented simple and easy steps to get…

AWS

8 min read

Test Driving Delta Lake 2.0 on AWS EMR — 7 Key Learnings
Test Driving Delta Lake 2.0 on AWS EMR — 7 Key Learnings
AWS

8 min read


Published in Towards Data Science

·Aug 31, 2022

Getting started with Delta Lake & Spark in AWS— The Easy Way

A step-by-step tutorial to configure Apache Spark and Delta Lake on EC2 in AWS along with code examples in Python — If you have worked on engineering a datalake or lake-house solutions, chances are that you may have employed (or have heard of) de-coupled and distributed computation frameworks against scalable storage layer of your datalake platform. Though the list of such computation frameworks is growing, but Apache Spark has continued to…

Spark

8 min read

Getting started with Delta Lake & Spark in AWS— The Easy Way!
Getting started with Delta Lake & Spark in AWS— The Easy Way!
Spark

8 min read


Published in Level Up Coding

·Mar 7, 2022

Versioning Thy Infra: Tagging AWS Resources with Git Commit Hash in CICD pipelines

By using AWS CodePipeline, CloudFormation, Lambda and Python. — If your team follows leading DevOps practices, there is quite a high likelihood that you are maintaining your infrastructure as a code (IaC). IaC approaches accelerate consistent and repeatable deployment of your infrastructure and its configuration. One of the use-cases where IaC is particularly helpful is in CICD pipelines where…

AWS

6 min read

Versioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelines
Versioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelines
AWS

6 min read


Published in Level Up Coding

·Jan 24, 2022

How to Efficiently Transform a CSV File and Upload it in Compressed Form to AWS S3 (Python, Boto3)

If you have been working in Data Engineering space, chances are that you would’ve been involved in processing CSV files. Even though its not the most efficient format for analytics but CSV format still enjoys quite a significant footprint in the current data landscape. It’s a widely supported format and…

Python

6 min read

Efficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using Python
Efficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using Python
Python

6 min read


Published in Level Up Coding

·Mar 2, 2021

WatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Python Application Logs

My hypothesis is that if there is one thing that we all data-guys can agree on, it's that logs are crucial for any data application. When set up correctly, logs unlock a lot of insights about the operation of an application. If an application behaves normally, logs are the go-to…

AWS

6 min read

WatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application Logs
WatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application Logs
AWS

6 min read


Published in Towards Data Science

·Nov 9, 2019

Machine Learning (kmeans clustering) in SparkML vs AWS SageMaker — My Two Cents

Machine Learning, the ability to learn from data, has been one of the most successful and disruptive use-cases of Big Data. In the landscape of data and analytics, one has access to myriad of tool-set to undertake machine learning tasks of varying nature and complexity. However when one is operating…

Machine Learning

8 min read

Machine Learning (kmeans clustering) in SparkML vs AWS SageMaker —  My Two Cents
Machine Learning (kmeans clustering) in SparkML vs AWS SageMaker —  My Two Cents
Machine Learning

8 min read


Published in Towards Data Science

·Oct 28, 2019

AWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t Ignore

If you are in data and analytics industry, you must have heard of the burgeoning trend “data-lake” which, on simpler notes, represents a storage strategy that allows organizations to store data from different sources and of different characteristics (size, format and velocity) in one place. Data-lake then becomes an enabler…

AWS

7 min read

AWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t Ignore
AWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t Ignore
AWS

7 min read


Published in Towards Data Science

·Aug 12, 2019

Launching AWS EMR backed SageMaker Notebooks via Infrastructure As Code (Boto3, CloudFormation, Python)

Scalable analytics in the cloud is name of the game these days. All the leading cloud providers are focusing significantly on provisioning services that streamline end-to-end lifecycle of machine learning. The trend these days is to have data ingested on data-lake (which requires its own set of considerations) and process…

AWS

10 min read

Launching EMR backed SageMaker Notebook via InfraStructure As Code (Boto3, CloudFormation)
Launching EMR backed SageMaker Notebook via InfraStructure As Code (Boto3, CloudFormation)
AWS

10 min read


Published in Towards Data Science

·Jul 9, 2019

Using Azure Cognitive Services for Sentiment Analysis of Trump’s Tweets

An extensive tutorial on how to use Azure Cognitive Services (Text analytics API) to perform sentiment analysis using Databricks (Python, Scala) — First Section — Extracting Tweets Sentiments can manifest anywhere from reviews, news, real-life conversations, journalism to name a few. The capability to identify the polarity of sentiments accurately by employing machine learning approaches unlocks series of business use-cases that can yield immense value for businesses. Instead of humans to parse through the content to infer…

Scala

15 min read

Using Azure Cognitive Services for Sentiment Analysis of Trump’s Tweets
Using Azure Cognitive Services for Sentiment Analysis of Trump’s Tweets
Scala

15 min read

Irfan Elahi

Irfan Elahi

251 Followers

Enterprise Data Engineer @ Transurban | Author | Photographer

Following
  • ReadWrite

    ReadWrite

  • M.G. Siegler

    M.G. Siegler

  • Code.org

    Code.org

  • Marcos Ortiz

    Marcos Ortiz

  • Animoto

    Animoto

See all (384)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech