Do Penguins do Analytics?

Julien Kervizic
Hacking Analytics
Published in
2 min readOct 6, 2018

--

A few years ago, I was introduced to the practice of analytics from the linux command lines through a wikibooks. Command line analytics can be particularly useful with dealing with large outputted datasets, I have had to previously used its feature in as diverse as:

  • using wc -l to determine if my data load was successful
  • using head and tail to splits large files for loading into a database
  • using sed, to convert across text file formats & getting rid of special characters causing issues in my data load
  • using grep, sed, cut and join to determine usage for a subset of an app user base, based on data obtained from API calls
  • using cut,sort, uniq to identify the most common failing hosts based on log file data

Most of the features offered in SQL are available in unix command lines. These commands are not meant to be used for full fledge analytics queries, but offer a quick and efficient solution for a lot of small & quick questions one can have looking a dataset or log files.

There is also a number of SQL interface for terminals that allow to work directly on text files, but none have the ubiquity of the unix command line nor the same ability to work on less structured datasets.

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com