PinnedPublished inData Engineer ThingsDeep dive into the challenges of building Kafka on top of S3.It’s really toughMay 82May 82
PinnedPublished inData Engineer ThingsBufstream: Stream Kafka Messages to Iceberg Tables in Minutes8x cheaper than Kafka + native support for data quality and seamless transformation of Kafka topics into Iceberg tables.Mar 27Mar 27
PinnedPublished inData Engineer ThingsBauplan: Operate your lakehouse with zero infrastructureFaaS data pipelines on S3Mar 20Mar 20
PinnedPublished inData Engineer ThingsI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 24, 202423Aug 24, 202423
PinnedPublished inData Engineer ThingsHow does Uber build real-time infrastructure to handle petabytes of data every day?All insights from the paper: Real-time data infrastructure at UberMar 23, 202421Mar 23, 202421
Published inData Engineer ThingsHow is Databricks’ Spark different from Open-Source Spark?Why don’t they just use the open-sourced Apache Spark?3d ago3d ago
Published inData Engineer ThingsHow did Airbnb build their semantic layer?Minerva, the Airbnb metric platformMay 12May 12
Published inData Engineer ThingsLet’s use Orchestra to build an end-to-end data pipeline in 10 minutesSpoiler: You don’t have to manage the infrastructure.Apr 24Apr 24
Published inData Engineer ThingsWhy is dbt So Popular?The motivation behind dbt and why it’s becoming a transformation standard(?)Apr 1718Apr 1718
Published inData Engineer ThingsWhy Walmart Chose Apache Hudi for Their LakehouseWhat can we learnApr 105Apr 105