Are We Taking Only Half Of The Advantage Of Columnar File Format?

Eric Sun
Eric Sun
Mar 16, 2020 · 8 min read

(* originally posted in LinkedIn in 2018 )

Columnar file formats have become the primary storage choice for big data systems, but when I Googled related topics this weekend, I just found that most articles were talking about the simple query benchmark and storage footprint comparisons between a particular columnar format vs. row formats. Sorting is also a critical feature of columnar formats, but its benefit and effective practice have not been emphasized or explained in detail so far. IMHO, using columnar formats…