Abhishek TrehanMerge Excel Files using PythonThere are instances when you would have to work with various excel files. It can be tedious task to combine all of them and collate all…Sep 18Sep 18
Abhishek TrehanJSON to CSV ConvertorSimple script to make your life easier when you need to convert a json file to csv. This script makes use of csv and json packages. Code is…Aug 15Aug 15
Abhishek TrehanWriting Unit Test Cases for PySparkThe pytest framework makes it easy to write small, readable tests, and can scale to support complex functional testing for applications and…Aug 8Aug 8
Abhishek TrehanCustom SCD2 Using PySparkA Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It…Jul 30Jul 30
Abhishek TrehanEasy Guide on Creating UDFs in PySparkWelcome back to another article where we are going to touch upon PySpark. This is one skill that you as data engineer really need to hone…Jul 26Jul 26
Abhishek TrehanGenerate Unique IDs in PySparkThis is one of the most common situations you will encounter as a data engineer/ data Scientist. You can always partition the data using…Jul 26Jul 26
Abhishek TrehanExcel with Excel using PySparkJust recently I encountered a use case where I needed to read a bulk of excel files with various tabs and perform heavy ETL. Most of us are…Jul 25Jul 25
Abhishek TrehanDelta Load with SCD2- Applying Adjustments to your Base dIn continuation to previous article where we talked about implementing SCD load to your target, this is a follow-up article to the same…Jul 18Jul 18
Abhishek TrehanImprove performance of your Spark CDC jobs with Merge StatementsWhen working with large volumes of data it is very important to consider the performance of your ETL jobs. CDC or as it stands — Change…Jul 17Jul 17