Computing Word Frequency in Shakespeare for NLP
Just using terminal start counting words in 5 minutes
For the purposes of counting words in Shakespeare we will use the following Unix commands/tools
- tr — replace words
- sort — sort words
- uniq — find unique occurences
- grep — find specific words
- less — show a limited text on console
tr is a utility in Unix-like operating systems used to replace or remove specific characters in its input data set. It is an abbreviation of translate or transliterate.
Basic Syntax for all tools
tr
The utility reads a byte stream from input and writes the result to the console. As arguments, it takes two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the second set. For example,
tr 'abcd' 'jkmn'
maps all characters a to j, b to k, c to m, and d to n. The character set may be abbreviated by using character ranges.
tr [A-Z] [a-z]