PinnedPublished inData Engineer ThingsDeep dive into the challenges of building Kafka on top of S3.It’s really toughMay 8A response icon2May 8A response icon2
PinnedPublished inData Engineer ThingsBufstream: Stream Kafka Messages to Iceberg Tables in Minutes8x cheaper than Kafka + native support for data quality and seamless transformation of Kafka topics into Iceberg tables.Mar 27A response icon1Mar 27A response icon1
PinnedPublished inData Engineer ThingsBauplan: Operate your lakehouse with zero infrastructureFaaS data pipelines on S3Mar 20A response icon1Mar 20A response icon1
PinnedPublished inData Engineer ThingsI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 24, 2024A response icon23Aug 24, 2024A response icon23
PinnedPublished inData Engineer ThingsHow does Uber build real-time infrastructure to handle petabytes of data every day?All insights from the paper: Real-time data infrastructure at UberMar 23, 2024A response icon21Mar 23, 2024A response icon21
Published inData Engineer ThingsStream Kafka Topic to the Iceberg Tables with Zero-ETLA solution from AutoMQ: open-sourced + no need for ETL pipeline maintenance4d ago4d ago
Published inData Engineer ThingsHow did Meta modernize their lakehouse?The new approach enabled Meta to innovate faster.Jun 12A response icon1Jun 12A response icon1
Published inData Engineer ThingsI spent 6 hours learning how Google serves analytics applications10GBs/ s throughput and sub-milliseconds query latencyJun 5Jun 5
Published inData Engineer ThingsThe internals of BigQuery, Snowflake, Databricks, and Redshift4 famous cloud data warehouses in an articleMay 29May 29
Published inData Engineer ThingsI spent 5 hours understanding how Uber built their ETL pipelines.Spoiler: They don’t use batch or stream pipelinesMay 22A response icon14May 22A response icon14