17 Must Know Commands for Any ML Engineer

Adam Gabriel Dobrakowski
MIM Solutions Blog
Published in
4 min readApr 19, 2023

Author: Adam G. Dobrakowski
Redaction: Zuzanna Kwiatkowska

Since I started working on Machine Learning projects a couple of years ago, I decided to build a cheatsheet with the most important commands to use on a day-to-day basis.

Most of them are used so rarely that they are hard to remember. On the other hand, after a short time, I realised that I’m looking for the same information on Stack Overflow over and over again.

That’s why in this short article, I would like to share with you a part of my cheatsheet. I hope that it is going to be useful in your work.

Tech Stack

The technologies I use most often in my projects are:

  1. Jupyter Notebook — to do a quick data analysis and experimenting,
  2. Visual Studio Code — as an IDE to write Python code,
  3. Remote repository,
  4. Linux.

To create data analysis efficiently and quickly, I use Python’s libraries such as Pandas and Matplotlib.

Let’s start with imports!

import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import HTML

Pandas

1. Show all rows in a table

# turn on
pd.set_option('display.max_rows', None)

# turn off
pandas.reset_option('display.max_rows')

or alternatively

with pd.option_context("display.max_rows", 1000):
display(df)

In my opinion, the second option is better, because we don’t have to remember about switching it off every time. This is based on my own experience, when I often forgot about it and crushed my jupyter notebook when trying to display a large table.

2. One-liners to make DataFrame processing easier

I particularly like 3 of them.

To change name of a single column, for example from “A” to “B”, use:

df.rename(columns={'A': 'B'})

To add a column and automatically fill it with ones:

df.assign(one=1)

To delete column called “campaign_id”:

df.drop(columns=['campaign_id'])

3. Merging the DataFrames

If you want to do it row-wise, you can use:

df = df1.append(df2)

When merging column-wise, you just have to add additional argument:

pd.concat([df1, df2], axis=1)

4. Converting DataFrame with 2 columns to dictionary

df.set_index('Column1')['Column2'].to_dict()

You can also do a backward operation and create DataFrame from dict quickly:

pd.DataFrame.from_dict(my_dict, orient='index')

5. Creating additional column with percentage statistics

Imagine you have a database consisting of ad campaigns. For each ad campaign, we know how many clicks it got and on which day. Now, for each day, we want to know how much each campaign contributed to all clicks within this day. Sounds difficult, but we can actually do that in a single line!

df['clicks_perc'] = df[['clicks', 'campaign_id', 'day']].groupby(['campaign_id', 'day']).transform(lambda x: x / x.sum())

6. Plotting 2 variables in a single graph using Pandas

df[['income', 'cost']].plot()
plt.show()

7. Creating a plot for a single category

Imagine you have a database with ad clicks. You measure them every hour for all of your websites. How would you create a plot in which you can see the number of ad clicks over time for every website separately? My solution would be:

plot_df = df[['clicks', 'page', 'hour']].set_index(['page', 'hour']).unstack('hour')
plot_df.columns = [c for (_, c) in plot_df.columns]
plot_df.plot()

Plots

8. Quickly beautify plots in Matplotlib

plt.rcParams["figure.figsize"] = (20,10)
plt.rcParams["font.size"] = 22
plt.style.use('bmh')

# reset
plt.rcParams.update(plt.rcParamsDefault)

9. Add vertical and horizontal grid lines to your plot

For vertical:

plt.axvline(x=0, color='grey', linestyle='-')

And horizontal:

plt.axhline(y=0.0, color='k', linestyle='-')

Jupyter Notebook

10. Using Python code from .py files in the notebook

Imagine you have a directory where you store two sub-directories: ipython with Jupyter Notebooks and lib with your Python code in .py files. To import from lib inside the notebook, simply use:

import os
while 'ipython' in os.getcwd():
os.chdir("../")

11. Making the command windows larger

By default, the code window in Jupyter doesn’t cover the full width of your browser. If you have a wide monitor, it may be frustrating, especially when you want to analyse databases with a lot of columns. You can change it using:

display(HTML("<style>.container { width:100% !important; }</style>"))

12. Beautify HTML titles in the notebook

display(HTML("<style> .container {width: 100% !important; } </style>"))

Terminal and Git

13. Displaying JSON-like format in your terminal

echo '{"a":[2,3]}' | json_pp

14. Find the system processes that use your computer memory the most

ps aux --sort=-%mem | head

15. Running Jupyter Notebooks from the terminal

runipy -o my_notebook.ipynb

16. Choosing a file when you have a merge conflict in Git

In my opinion, it’s particularly useful when you have a conflict between Jupyter Notebooks.

git checkout --theirs [--ours] path/to/file

17. Reverting your commit

Imagine you want to revert 5 commits to 3 commits behind. You can then provide a list to your git revert:

git revert HEAD~5..HEAD~2

Simply using HEAD~2 would only revert a single commit.

Conclusions

I hope that some of those commands were surprising for you and that you’re going to use them!

Do you also have your own command and functions cheat sheet? If so, share your best ones on LinkedIn with us!

--

--

Adam Gabriel Dobrakowski
MIM Solutions Blog

Founder and CEO of COGITA | AI Solutions for Business 📊 | Creator of ML & AI Courses | ML & AI Expert