PinnedPublished inData Engineer ThingsAutoMQ: Achieving Auto Partition Reassignment In Kafka Without Cruise ControlAutoMQ’s stateless brokers and its self-balancing featureNov 202Nov 202
PinnedPublished inThe Deep HubHow AutoMQ Reduces Nearly 100% of Kafka Cross-Zone Data Transfer CostProducing data with the broker in the same availability zone with S3 WAL, self-balancing, and leveraging Kafka rack-awarenessOct 221Oct 221
PinnedPublished inThe Deep HubHow do we run Kafka 100% on the object storage?Let’s see how AutoMQ makes this dream come true.Aug 275Aug 275
PinnedPublished inData Engineer ThingsI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 2419Aug 2419
PinnedPublished inData Engineer ThingsHow does Uber build real-time infrastructure to handle petabytes of data every day?All insights from the paper: Real-time data infrastructure at UberMar 2319Mar 2319
Published inData Engineer ThingsI spent 4 hours learning how Netflix operates Apache Iceberg at scale.Iceberg The Backbone At Netflix Data Platform Architecture2d ago12d ago1
Published inData Engineer ThingsHow does Netflix ensure the data quality for thousands of Apache Iceberg tables?The Write-Audit-Publish pattern with Iceberg Branches6d ago6d ago
Published inData Engineer ThingsI spent 8 hours relearning the Delta Lake table formatThe format, Read/Write process, Concurrency, Data Mutation and moreNov 23Nov 23
Published inData Engineer ThingsDataHub: The Metadata Platform Developed at LinkedInHow did LinkedIn manage the data catalog at scale?Nov 191Nov 191
Published inData Engineer ThingsI spent 8 hours learning the ClickHouse MergeTree Table EngineConcepts, The Write/Read Process, The Mutation and The replicationNov 16Nov 16