Computing Word Frequency in Shakespeare for NLP

Just using terminal start counting words in 5 minutes

Rishi Sidhu
AI Graduate

--

Photo by Max Muselmann on Unsplash

For the purposes of counting words in Shakespeare we will use the following Unix commands/tools

  • tr — replace words
  • sort — sort words
  • uniq — find unique occurences
  • grep — find specific words
  • less — show a limited text on console

tr is a utility in Unix-like operating systems used to replace or remove specific characters in its input data set. It is an abbreviation of translate or transliterate.

Basic Syntax for all tools

tr

The utility reads a byte stream from input and writes the result to the console. As arguments, it takes two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the second set. For example,

tr 'abcd' 'jkmn'

maps all characters a to j, b to k, c to m, and d to n. The character set may be abbreviated by using character ranges.

tr [A-Z] [a-z]

--

--

Rishi Sidhu
AI Graduate

Blockchain | Machine Learning | Product Management