Don’t use Anaconda: How to setup a decent machine learning environment?
Setting up a decent, robust machine learning environment with Miniconda on Windows / Linux.
Anaconda is bloated. It comes with an installation size of over 2 gigabytes and also installs a bunch of software that we normally won’t use, such as the Python IDE: Spyder. (I mean, it’s 2020 already, who doesn’t use VS Code?)
After tinkering with my machine learning environment for over a month, I’ve come up with the following instructions and techniques to help you setup a decent and modern Python environment to study machine learning without cluttering up your current local dev environment in order to get a move on your research project, graduation thesis, etc.
🍭 Editor’s note: This is part of my personal machine learning Wiki, where I demonstrate my entire learning process on adversarial examples, which is the research direction for my graduation thesis. I personally think that this particular section of my Wiki useful, so I organized it into a separate article which you are reading now. Find out more at: Adversarial Attacks Targeted on Neural Networks — Spencer’s Wiki.
Before we begin, do keep in mind that it’ll be best if you were to begin your journey into machine learning on a *NIX environment, like Linux. Let’s move on.
Installing Anaconda (Miniconda)
Wait, what? Didn’t we just say we won’t use Anaconda? Well, yes, we won’t be using Anaconda exactly. Instead, we’ll be installing Miniconda — the unbloated version of Anaconda. The relationship between Anaconda, Miniconda and Conda is best explained here: The Definitive Guide to Conda Environments — Towards Data Science. In short, Conda is a tool for managing Python dependencies and creating virtual environments, both Anaconda and Miniconda includes Conda, but Anaconda is much larger than Miniconda and includes unnecessary components.
🍚 Note: You won’t need to install Python beforehand, as Miniconda will manage and install the dedicated version of Python that you will need. Installing another Python other than the system preinstalled one may lead to problematic issues.
Downloading the installer
💻 Windows: On Windows, we have the useful CLI installer (or package manager if you will): scoop. It’s recommended that you use scoop for your installation of CLI software. See here for my introduction into scoop — the Windows package manager:「一行代码」搞定软件安装卸载,用 Scoop 管理你的 Windows 软件.
First, install scoop and add the extras
bucket:
# Install scoop
iwr -useb get.scoop.sh | iex# Add the extras bucket
scoop bucket add extras
Then install Miniconda with the following command:
scoop install miniconda3
And we’re done! It’s just that easy.
Also, of course you can download the Miniconda installer for Windows directly on its official website, but it’s basically the same as using scoop, and you won’t have to deal with environment variables and other inconveniences.
📟 Linux: Miniconda doesn’t come with a package-manager-managed version (i.e, APT: Ubuntu’s Advanced Package Tool. See here: Dev on Windows with WSL — CLI — APT). We’ll be using the official installer script to install it.
First, go to Miniconda’s homepage: Docs » Miniconda, and fetch the link for the latest version of Miniconda released with Python 3 on Linux:
With the installer link copied to your clipboard, we can simply run the following command to download the installer:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
The command will download the installer script via wget
. Then we can run the script with bash:
You will be prompted to view the Miniconda License and start the installation process by entering “yes” in the terminal.
🍚 Note: This installation process may require a system installed version of Python, which we won’t use. But if the installer complains about not being able to find a working version of Python, we can install one by running
sudo apt install python3
.
Dealing with post-installation issues
💻 Windows: Considering that we’ll be using PowerShell, we’ll need to first initialize the conda in PowerShell’s user configuration:
conda init powershell
This command actually creates a PowerShell configuration file inside your PowerShell user configuration folder, which usually lies inside ~\Documents\WindowsPowerShell\profile.ps1
, and, in my case, puts the following code in the configuration file:
#region conda initialize
# !! Contents within this block are managed by 'conda init' !!(& "C:\Users\Spencer\scoop\apps\miniconda3\current\Scripts\conda.exe" "shell.powershell" "hook") | Out-String | Invoke-Expression#endregion
Close and reopen PowerShell to see Miniconda take effect:
📟 Linux: If all is setup successfully according to default configurations, there’s a big chance that you’ll end up without a conda
executable in your path, because the installer thinks we use bash
by default and go change .bashrc
while most of us use zsh
or fish
instead.
You can find the Miniconda’s bin
and the tool conda
itself here: ~/miniconda3/bin
. We'll need to init the Miniconda instance manually by editing our shell's configuration file. (That will be ~/.zshrc
for zsh
and ~/.config/fish/config.fish
for fish
.)
Run the following command by invoking the conda
executable by its full path:
~/miniconda3/bin/conda init {THE_SHELL_YOU_USE}
In my case, conda init fish
actually added the following content into my shell's config:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!eval /home/spencer/miniconda3/bin/conda "shell.fish" "hook" $argv | source# <<< conda initialize <<<
Close then reopen the terminal to see Miniconda take effect:
📢 Both OS: Conda will initialize itself by default, and activate the “base” Conda environment, but I personally don’t want to actually “activate” the Conda environment whenever I open up a terminal. We can disable this feature and activate Conda manually every time we want to enter a conda environment by invoking the following command:
conda config --set auto_activate_base false
Using Conda to manage our project
After installing Conda, we will use it to:
- Create a new virtual environment to host our simple machine learning project
- Install our friendly neighborhood machine learning framework: TensorFlow and Keras, inside of our virtual environment
- Install our extremely useful scientific notebook for writing and developing machine learning code: Juypter Notebook
- …
With the help of a few commands. Let’s get started.
Creating a new virtual environment
Before everything, let’s create a folder to contain all our code files.
# Making a directory called adversarial-attacks
mkdir adversarial-attacks# Navigating into the directory
cd adversarial-attacks
Next up, we’ll create a virtual environment to help manage our code and project. If you are going to deploy your environment on different machines on different platforms, it’s considered best practice to create an environment.yml
to define our environment's name, dependencies, channels and more. In this way, we won't have to deal with incompatible dependencies on different OS.
Create a file named environment.yml
at the root of our project folder, and inside, we'll need to define:
- Our environment’s name:
name
- Which channel will Conda install our dependencies from:
channels
- What dependencies will Conda actually install:
dependencies
At the end of the day, our environment.yml
will be something like this:
name: adversarial-attacks
channels:
- defaults
dependencies:
- python
- tensorflow
- numpy
- matplotlib
- pylint
- autopep8
- notebook
We can see that I have defined our environment’s name to be adversarial-attacks
, and added some essential dependencies that are essential to our project. After that, we can create our environment and install all our dependencies based on this file using the following command:
conda env create --file environment.yml
Then, after successfully creating our virtual environment, we can activate it with:
conda activate adversarial-attacks # or your environment name
And if you wanted to add dependencies to your environment, just add it directly to the environment.yml
file, and update your environment with:
conda env update --file environment.yml
Deactivate it with:
conda deactivate
Running Jupyter Notebook
Run Jupyter Notebook from the command line:
# Launching the default browser at the same time, or ...
jupyter notebook# Launching the notebook server only (When running inside WSL)
jupyter notebook --no-browser
🍚 Note: When running inside WSL, the command
jupyter notebook
actually tries to invoke the default browser inside Windows but fails tragically. We recommend adding the--no-browser
command and copy the URL manually.
Using VS Code as our workbench
VS Code is an amazing code editor that we can use as our main Python development environment. Personally, I use VS Code for almost every project I have, whether it’s Rust, Go, Node.js or something else. What’s more, if you are trying to use WSL, you can hook your Windows side VS Code onto your Ubuntu WSL environment using a plugin called Remote — WSL. See here for more details: 🇺🇸Developing in WSL | 🇨🇳Visual Studio Code — Dev on Windows with WSL.
Then, install the Anaconda Extension Pack, which includes a copy of the necessary Python extension, and language support for YAML.
Now, you’ll be able to code, lint, debug and run Python files. Also, you can now run Jupyter Notebook directly inside VS Code.
That’s all. This tutorial basically covers all you’ll need when setting up an Anaconda development environment, and using environments.yml
, we will be able to migrate our environment across different OS and platforms with ease. Thank you for reading.