Are We Taking Only Half Of The Advantage Of Columnar File Format?

Eric Sun
Analytics Vidhya
Published in
8 min readMar 16, 2020

(* originally posted in LinkedIn in 2018 )

Columnar file formats have become the primary storage choice for big data systems, but when I Googled related topics this weekend, I just found that most articles were talking about the simple query benchmark and storage footprint comparisons between a particular columnar format vs. row formats. Sorting is also a critical feature of columnar formats, but its benefit and effective practice have not been emphasized or explained in detail so far. IMHO, using columnar formats…

Eric Sun
Analytics Vidhya

Advocate best practice of big data technologies. Challenge the conventional wisdom. Peel off the flashy promise in architecture and scalability.