For many data scientists, data manipulation begins and ends with Pandas or the Tidyverse. In theory, there is nothing wrong with this notion. It is, after all, why these tools exist in the first place. Yet, these options can often be overkill for simple tasks.
Aspiring to master the command line should be on every developer’s list, especially data scientists. Learning the ins and outs of your terminal will undeniably make you more productive. Beyond that, the command line serves as a great history lesson in computing. For instance, awk — a data-driven scripting language. Awk first appeared in 1977 with the help of Brian Kernighan, the K in the legendary K&R book. Today, some near 50 years later, awk remains relevant with new books still appearing every year! Thus, it’s safe to assume that investing in a little command line wizardry won’t depreciate any time soon.
What We’ll Cover
- ICONV
- HEAD
- TR
- WC
- SPLIT
- SORT & UNIQ
- CUT
- PASTE
- JOIN
- GREP
- SED
- AWK