PinnedPublished inData Engineer ThingsDeep dive into the challenges of building Kafka on top of S3.It’s really toughMay 8A response icon2May 8A response icon2
PinnedPublished inData Engineer ThingsBufstream: Stream Kafka Messages to Iceberg Tables in Minutes8x cheaper than Kafka + native support for data quality and seamless transformation of Kafka topics into Iceberg tables.Mar 27Mar 27
PinnedPublished inData Engineer ThingsBauplan: Operate your lakehouse with zero infrastructureFaaS data pipelines on S3Mar 20Mar 20
PinnedPublished inData Engineer ThingsI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 24, 2024A response icon23Aug 24, 2024A response icon23
PinnedPublished inData Engineer ThingsHow does Uber build real-time infrastructure to handle petabytes of data every day?All insights from the paper: Real-time data infrastructure at UberMar 23, 2024A response icon21Mar 23, 2024A response icon21
Published inData Engineer ThingsHow is Databricks’ Spark different from Open-Source Spark?Why don’t they just use the open-sourced Apache Spark?5d agoA response icon15d agoA response icon1
Published inData Engineer ThingsHow did Airbnb build their semantic layer?Minerva, the Airbnb metric platformMay 1A response icon2May 1A response icon2
Published inData Engineer ThingsLet’s use Orchestra to build an end-to-end data pipeline in 10 minutesSpoiler: You don’t have to manage the infrastructure.Apr 24Apr 24
Published inData Engineer ThingsWhy is dbt So Popular?The motivation behind dbt and why it’s becoming a transformation standard(?)Apr 17A response icon18Apr 17A response icon18
Published inData Engineer ThingsWhy Walmart Chose Apache Hudi for Their LakehouseWhat can we learnApr 10A response icon5Apr 10A response icon5