PinnedVu TrinhinThe Deep HubHow AutoMQ Reduces Nearly 100% of Kafka Cross-Zone Data Transfer CostProducing data with the broker in the same availability zone with S3 WAL, self-balancing, and leveraging Kafka rack-awarenessOct 22Oct 22
PinnedVu TrinhinThe Deep HubHow do we run Kafka 100% on the object storage?Let’s see how AutoMQ makes this dream come true.Aug 275Aug 275
PinnedVu TrinhinData Engineer ThingsI spent 8 hours learning Parquet. Here’s what I discoveredI finally sat down and learned about it.Aug 2418Aug 2418
PinnedVu TrinhinData Engineer ThingsHow does Uber build real-time infrastructure to handle petabytes of data every day?All insights from the paper: Real-time data infrastructure at UberMar 2319Mar 2319
Vu TrinhinData Engineer ThingsI spent 8 hours learning the details of the Apache Spark scheduling process.Anatomy of a Spark job and the typical scheduling process.3d ago3d ago
Vu TrinhinData Engineer ThingsI spent 5 hours exploring the story behind Apache Hudi.Why did Uber create it back then? What makes Hudi different from Iceberg or Delta Lake?5d ago5d ago
Vu TrinhinData Engineer ThingsI spent 6 hours learning Apache Arrow: OverviewWhy do we need a standard memory format for analytics workload?6d ago26d ago2
Vu TrinhI spent 8 hours researching WarpStreamRewriting Kafka protocol in Go and running 100% on object storageOct 52Oct 52
Vu TrinhinData Engineer ThingsI spent 8 hours diving deep into Snowflake (again)Virtual Warehouse, Intermediate Storage, Cache, and Remote StorageSep 282Sep 282
Vu TrinhinGoogle Cloud - CommunityI spent 5 hours learning how Google lets us build a Lakehouse.The Google Cloud BigLakeSep 24Sep 24