Data Munching Tips & Tricks

Find out the number of columns in a file

awk -F'<DELIMITER>' '{ print NF;}'

Best when:

  • you want to ensure all the rows have same number of columns.
  • Default Delimiter is TAB.

Delete the first line from a file [Source]

sed ‘1d’ <original file> > <new file>

Best when :

  • you want to delete the first line from a very large file. An editor just won’t cut it.
  • you want to delete many lines from the beginning of the file. Just replace ‘1d’ with ‘<NUMBER OF LINES TO DELETE>d’
  • Additional Tip: Delete last line with `$ d`

Remove unprintable characters from a file [Source]

tr -cd ‘\11\12\15\40-\176’ <file-with-binary-chars> <clean-file>

Best when:

  • you keep getting “This is a binary file” when you try to open the file in an editor. You know its not ! Its text with some weird characters!
  • you keep getting weird numbers when you are trying to check the number of columns in the file. These characters do mess up.

Viewing top 10 lines from a file

head -n10 <filename>

Best when:

  • you need to get the top n lines maybe to see if everything alright.
  • there are other ways as well but hey .. everyone has their favorite right ?

Changing the delimiter in a file

sed "s/<CURRENT_DELIMITER>/<NEW_DELIMITER>/g" oldfile > newfile
sed "s/^A/\t/g" oldfile > newfile

Best when:

  • you need to change the delimiter to a better one !