Useful terminal tips and tricks for the Machine Learning practitioner

enrique a.
6 min readAug 30, 2018

--

A regular day in the life of a Machine Learning practitioner usually involves a lot of data and log files visualization, manipulation and preprocessing. Many of these tasks can be easily done in our terminal, without using external software. I will include some of the tricks and tips I know that can help to increase the productivity and simplify common tasks.

I’m going to start with my favorite one, and actually the one that convinced me to write this story. If you are going to take something for your use, it is possibly this one.

Create a quick plot directly from the terminal

Suppose you are training a deep neural network system, it takes ages to give good results and usually prints tens of thousands of lines about the progress (average loss, number of epoch, etc). Now it is in the middle of a training, it has been running for a day now, and you are not really sure if the last parameter tweak you tried will have the expected result.

Of course you could manually copy the lines with the average loss, paste into excel and create a graph, but it takes a lot of time and it is cumbersome specially if you want to monitor the progress every few minutes.

The answer: Use gnuplot. It doesn’t really come installed in Linux or MacOS, but is super easy to get.
Install in Ubuntu:

sudo apt-get install gnuplot-x11

Install in Mac OS:

brew reinstall gnuplot --with-qt

How to use: In the simplest form it expects a single numeric value per line, it will calculate automatically the x and y ranges, etc.

cat file_with_my_values.txt | gnuplot -p -e 'plot "/dev/stdin"'

Gnuplot will generate graph like the following , linux on the right, Mac OS on the left. It will take no time, specially in linux it takes literally less than two seconds. Check the last section for a real life example.

Save all the text printed on the terminal to a log file

As I mentioned, the training of deep neural network systems is usually a quite long process, and depending on the system it may print tons of lines about the progress of the training. It would be very useful if we could easily save all this information to be able to analyze later (or online).

The answer: Use the script command. It is usually part of Linux and Mac OS systems, so there is no extra step here.

How to use: We type script name_of_file to start. Then we need to type exit to close the file.

bash-3.2$ script my_output.txt
Script started, output file is my_output.txt
bash-3.2$ echo "hello"
hello
bash-3.2$ ls -l | wc -l
28
bash-3.2$ exit
exit
Script done, output file is my_output.txt
bash-3.2$

Check the difference of two files in the terminal

Suppose you executed the same system with a small change in the parameters. You are not really sure if it will make a difference at all in the result, and the log files are convoluted and difficult to visually analyze.

The answer: Use diff command to quickly check what (if something) changed between the two logs.

How to use: Just execute diff file1 file2.

$ diff /tmp/f1 /tmp/f2
3c3
< tres
---
> cien

Quick trips and tricks

Reuse the last argument of the previous command.

The answer: Use the shortcut !$ . More info here.

$ vi /long/and/boring/path/to/type/file_i_want_to_check.txt
$ rm !$
# equals `rm /long/and/boring/path/to/type/file_i_want_to_check.txt`

Or

$ vi /long/and/boring/path/to/type/file_i_want_to_check.txt
$ cp !$ this_is_important_i_should_backup.txt

I want to split each line of a log by whitespaces and print only a column.

The answer: Use the cut command. Add-d to specify the delimiter, and -f for the number of the column.

bash-3.2$ cat /tmp/osom.txt
uno dos tres cuatro cinco
seis siete ocho nueve diez
bash-3.2$ cut -d" " -f 2 /tmp/osom.txt
dos
siete
bash-3.2$

I want to find out how big has this log file grown.

The answer: Use wc to count. First is number of lines, then words, and finally chars. Use wc -l if you only care about the number of lines.

bash-3.2$ wc /tmp/osom.txt
2 10 53 /tmp/osom.txt
bash-3.2$ wc -l /tmp/osom.txt
2 /tmp/osom.txt

I ran out of space on my device, what is occupying it?

The answer: Of course there are tons of GUI alternatives to examine your storage, but you can also do some quick analysis with the du command. You can combine it with sort for more fun. It actually differs from Linux to Mac OS, so you may check man du for more info.

bash-3.2$ du -s * | sort -nr
29104 demo.gif
17960 sample_img
8256 yolov2.weights
7464 darkflow
4440 build
648 run003_2017_3.log
560 preview.png
456 test
144 run002_2_moji806.log
8 setup.py
8 labelsMoji.txt
8 flow
0 ckpt
0 bin

Check the last lines of a file.

The answer: You should definitely know this already, but in case you don’t, you can use tail -N to print the last N lines of a file. Also tail -f is very handy to check the latest additions of a constantly growing file.

bash-3.2$ tail -5 darknet_log.txt
Region 94 Avg IOU: 0.532822, Class: 0.998738, Obj: 0.072012, No Obj: 0.000499, .5R: 0.500000, .75R: 0.000000, count: 2
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000153, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.584577, Class: 0.983266, Obj: 0.080435, No Obj: 0.000998, .5R: 0.750000, .75R: 0.250000, count: 4
Region 94 Avg IOU: 0.662080, Class: 0.996381, Obj: 0.054007, No Obj: 0.000489, .5R: 0.857143, .75R: 0.214286, count: 14
Region 106 Avg IOU: 0.499937, Class: 0.970562, Obj: 0.015308, No Obj: 0.000243, .5R: 0.428571, .75R: 0.142857, count: 7

I need to find text in a file.

The answer: Check here.

An example of getting all together

Just today I was working on a problem where I used several of the tricks detailed in this guide.

I am using an algorithm called YOLO (You Only Look Once) for object detection, and I am training the system with my own dataset to detect custom objects.

You can check here https://medium.com/@monocasero/object-detection-with-yolo-implementations-and-how-to-use-them-5da928356035 for a brief explanation of the algorithm and some of the implementations available

In a terminal, I start the script and then execute the training script, in this case is Darkflow:

bash-3.2$ script run077.log
Script started, output file is run077.log
bash-3.2$ python3 flow --model cfg/yolo.cfg (etc)

This will print in this terminal (and more importantly, to run077.log) lots of lines about the partial results of the training. Meanwhile, using another terminal I can instantly plot to visualize the trend of the loss in the last steps.

$ grep "step " run077.log  | grep -v "Checkpoint" | tail -300 | cut -d" " -f 10  | gnuplot -p -e 'plot "/dev/stdin"'

I explain:

  1. In the output of Darkflow, the lines containing “step “ are the ones printing the average loss.
  2. Except the ones with the word “Checkpoint”, hence the grep -v .
  3. I only want to plot the last 300 steps ( tail -300 ).
  4. When the line is separated by spaces, the 10th column is the running average, hence the cut -d" " -f 10 .
  5. Finally, I plot the result to easily visualize the trend.

Conclusion

Given enough time and resources we could use any major programming language to create a script to solve each one of these small needs, but we could also take advantage of the inherit power and versatility that our terminal (usually Linux or Mac OS) offers out of the box (or with easily available softwares).

I actually use in my daily work everything I included here, the list is indeed based on my personal notes. I hope it may be useful for somebody else.

--

--

enrique a.

Writing about Machine Learning, software development, python. Living in Japan working as a machine learning leader in a Japanese company.