Mohamed Bilal SDelta Lake: The Game-Changer for Slowly Changing Dimensions — A Step-by-Step Guide to SCD Type 2Introduction: Delta Lake has revolutionized the way we manage data in data lakes, and one of its most significant benefits is the…Aug 16Aug 16
Mohamed Bilal SLeveraging Pyspark with OpenAI API: Sentiment Analysis using Prompt engineering/ ChatGPTIncorporating generative AI capabilities on Pyspark DataframesAug 27, 20232Aug 27, 20232
Mohamed Bilal SCSV Bad Record Handling and it’s Complications— PysparkIntroduction:Sep 13, 20202Sep 13, 20202
Mohamed Bilal STesting and Deploying a Machine learning model using Flask API + Pycharm and DockerIntroductionAug 23, 2020Aug 23, 2020
Mohamed Bilal SSpark 3.0 New DataFrame functions — Part 2- CSV Pushdown Filter, max_by(), min_by() functionsIntroduction: This article is continuation to my previous article where we discussed about some of the new features that were added to…Aug 2, 2020Aug 2, 2020
Mohamed Bilal SSpark 3.0 new DataFrame functions overviewIntroduction: Spark 3.0 was released on 16 June 2020 with many new promising and cool features. The major optimization features being…Jul 25, 2020Jul 25, 2020
Mohamed Bilal SSpark Structured Streaming — Performing unsupported batch operations on a streaming dataframeIntroduction: Spark Structured streaming has become one of the most preferred streaming APIs of spark lately because of its ease of use as…Apr 5, 2020Apr 5, 2020
Mohamed Bilal SProgramming using SHC — Spark HBase Connector — using ScalaIntroduction:Apr 1, 2020Apr 1, 2020