Published inData EngineeringModern SQL — Not so commonly used, yet powerful featuresMost data platforms have evolved and so has the way we write SQLs. Although the basic fundamentas remain the same. I thought it would be…Oct 30, 2023Oct 30, 2023
Published inData EngineeringGet best performance for PySpark jobs using ParallelizeOct 24, 2023Oct 24, 2023
Published inData EngineeringTroubleshoot Spark/Pyspark performance issuesSteps to help troubleshoot common performance issues in Spark/Pyspark jobs taking EMR/Databricks as example.Oct 19, 2023Oct 19, 2023
Published inData EngineeringRethinking Surrogate Keys: Efficient Alternatives for Modern Data ModelsSurrogate keys have been unsung heroes of data modelling.Oct 9, 20231Oct 9, 20231
Create Answer as a Service using Generative AIAnswer as a Service” (AaaS) is a cloud-based solution that employs AI and language models to instantly provide accurate answers to user…Aug 21, 2023Aug 21, 2023
Published inData EngineeringGenerative AI — Threat/Opportunity — Data EngineeringGenerative AI is a rapidly developing field with the potential to revolutionize the way data engineers work. This technology can be used…Jul 31, 2023Jul 31, 2023
Published inData EngineeringData IntegrationData integration is the process of combining data from multiple sources into a single, unified view. ETL and ELT are two common techniques…Apr 11, 2023Apr 11, 2023
Published inData EngineeringData Modelling in Columnar Data Stores ?Introduction:Apr 6, 2023Apr 6, 2023