Hugo LuThe Rise of the Open Lakehouse made Databricks. It could bring it down.Why open storage invites competitors into the back gardenApr 13Apr 13
Manojkumar Vadivel14 VS Code Extensions Every Data Engineer Should Swear By for Maximum ProductivityAs a data engineer, your toolbox is everything. The right set of tools can save you time, reduce frustration, and make your workflows…Nov 24, 202414Nov 24, 202414
InData Engineer ThingsbyVu TrinhI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 24, 202423Aug 24, 202423
InTDS ArchivebyPiethein StrengholtMaster Data Management in Data MeshIf it’s stable and it truly matters, consider using MDMFeb 25, 20224Feb 25, 20224
Piethein StrengholtIntegrating Azure Databricks and Microsoft FabricThe article discusses the integration of Azure Databricks and Microsoft Fabric, presenting several architectural design options.Jun 4, 202412Jun 4, 202412
InWren AIbyHoward ChiThe new wave of Composable Data Systems and the Interface to LLM agentsThe new interface to bridge AI agents and the data system. How can we manage and design to meet the future demands of multi-AI agents?Sep 4, 2024Sep 4, 2024
Mudra PatelData Engineering concepts: Part 1, Data ModelingSource: https://medium.com/@dom.n/the-data-engineering-lifecycle-5c67bf6fb540Feb 12, 20248Feb 12, 20248
InData Engineer ThingsbyKaren Zhang10 Things I Learned from Reading Fundamentals of Data EngineeringAfter two enriching years as a Data Engineer, I finally had the chance to dive into Fundamentals of Data Engineering written by the…Aug 2, 202316Aug 2, 202316
Maxime BeaucheminThe Downfall of the Data EngineerThis post follows up on The Rise of the Data Engineer, a recent post that was an attempt at defining data engineering and described how…Aug 28, 201715Aug 28, 201715
Maxime BeaucheminFunctional Data Engineering — a modern paradigm for batch data processingBatch data processing — historically known as ETL — is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only…Jan 8, 201826Jan 8, 201826
Namrata MaliInside the Data Engineer’s InterviewCracking a data engineering interview can be challenging, especially if you don’t have prior experience or if you’re a recent graduate…Sep 9, 20231Sep 9, 20231
InPython in Plain EnglishbyRavindra KumarCrafting Robust Data Pipelines with SOLID Principles: Python’s Approach to Data EngineeringIn the bustling world of data engineering, the art of crafting robust, maintainable, and scalable pipelines is a feat not easily achieved…Aug 24, 2023Aug 24, 2023
InData Entropy Blog by SiffletbySalma BakoukData Quality Monitoring is dead. Say Hello to Full Data Stack ObservabilityOr how to unlock the reliability of your data assets at any stage of the pipelineAug 4, 20221Aug 4, 20221
InVelotio PerspectivesbyVelotio TechnologiesLessons Learnt While Building an ETL Pipeline for MongoDB & Amazon Redshift Using Apache AirflowRecently, I was involved in building an ETL(Extract-Transform-Load) pipeline. It included extracting data from MongoDB collections…Feb 4, 20193Feb 4, 20193
InTDS ArchivebyPedram NavidDagster, Airflow and Prefect: A Deep Diveeditor’s note: this post was updated on May 17 2024Jan 10, 20223Jan 10, 20223
InMicrosoft AzurebyNicholas HurtSecuring access to Azure Data Lake gen2 from Azure DatabricksThere are a number of ways to configure access to Azure Data Lake Storage gen2 (ADLS) from Azure Databricks (ADB). This blog attempts to…Jan 19, 20204Jan 19, 20204
InTDS ArchivebyCameron Warren8 Essential Python Techniques for Data Engineers and Analysts (with code samples)These are the Python code snippets I re-use the mostJan 17, 20223Jan 17, 20223
InTDS ArchivebyMadison SchottFollow These Best Practices for High-Quality Data IngestionHow to choose the right tool and integrate it into your data pipelineMay 12, 20224May 12, 20224
InCodeXbyBaysanA Modern Data Warehousing Tool: dbt & Introduction to Analytics EngineeringCreating own experimental lab by Docker to simulate ELT. What Analytics Engineering is. Introduction to dbt. What is the difference between…Mar 28, 2022Mar 28, 2022
InThe StartupbyPunchh Technology BlogApache HUDI vs Delta LakeThe tale of the two ACID platforms on Data LakesFeb 18, 20204Feb 18, 20204