Parquet: Columnar Storage for Hadoop Data
At AppNexus, over 2MM log events are ingested into our data pipeline every second. Log records are sent from upstream systems in the form of protobuf messages. Raw logs are compressed in Snappy when stored on HDFS. That said, even with compression, this…