Insights Into Parquet Storage
Most of you folks working on Big data will have heard of parquet and how it is optimized for storage etc. Here I will try to share some more insights into parquet architecture and how/why it is optimized. Also, I will add some tips to effectively use parquet to utilize all of its features.
What is Parquet
Parquet is an open-source file format in the Hadoop ecosystem. It is a flat columnar storage format that is highly performant both in terms of storage as well as querying.