Setting up the environments for ML Zoomcamp 2024

Till Meineke
4 min readSep 25, 2024

--

Last updated 25.09.2024

We need for the course:

In this section, I’ll describe how I prepared my local and remote environments for the course.

Setup Conda Environment on MacBook 13", M1, 2020, 16GB with Apple Silicon (arm64) running on Sonoma 14.7

I use a MacBook Pro 13" with Apple Silicon (arm64) and VSCode as my main editor. I use Conda as my environment manager and created a dedicated environment for the course. I also installed some additional software that I find useful, but that is described in a separate post (link will be updated). I know macOS 15 was just released, but I prefer to wait a bit before upgrading my system.

  • First install 🍺 Homebrew as a package manager for macOS, in Terminal.app run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • Install iterm2, miniforge, visual-studio-code and zoom as your basic requirements:
brew install iterm2 miniforge visual-studio-code zoom

Caveat: brew install conflict with docker desktop and command line tools. You need to install docker desktop first and then the command line tools (Issue).

  • When installing docker with brew, if you want to install docker desktop, you need to run the following command:
brew install --cask docker
  • Then install the following packages:
brew install docker docker-compose

How to shrink packages for dockerization

iTerm2

I use iTerm2 as my terminal emulator and installed Oh My Zsh as plugin manager for zsh. I also installed some plugins and themes that I find useful, but I’ll describe this in a separate post (link will be updated).

Miniforge

This is a minimal installer for Conda specific to conda-forge. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others. I created a dedicated environment.yml for the course.

To create this `ML_Zoomcamp2024` conda environment from the file run the following command in the folder containing this file:

conda env create -f environment.yml

Activate the environment with the following command:

conda activate ML_Zoomcamp2024

This installs the most important libraries for data science and needed for the course:

  • Python=3.11
  • NumPy, Pandas and Scikit-Learn (latest available versions)
  • Matplotlib and Seaborn
  • Jupiter notebooks

I also added some additional libraries (some needed for later in the course):

I still need to test the installation of other libraries required later in the course .

Tensorflow and PyTorch are installed with Metal GPU support for Apple Silicon (arm64) Macs. You can test the installation with this notebook

For me the output looks like this, so everything is installed correctly:

Python 3.11.10 | packaged by conda-forge | (main, Sep 22 2024, 14:11:13) [Clang 17.0.6 ]
Python Platform: macOS-14.7-arm64-arm-64bit

Pandas 2.2.2
NumPy 1.26.4
Scikit-Learn 1.5.1
SciPy 1.14.1

Tensor Flow Version: 2.14.0
Keras Version: 2.14.0
GPU is available

PyTorch version: 2.4.1
Is MPS (Metal Performance Shader) built? True
Is MPS available? True
Using device: mps

Managing Multiple Python Versions With pyenv

PyEnv: Managing Multiple Python Versions With Ease

Python venv: How To Create, Activate, Deactivate, And Delete

python env 101

pipenv vs conda

when and how to use Conda, Pipenv, Virtualenv, Pip, and Poetry

Using Pipenv to Manage Python Packages and Versions

Managing Application Dependencies

To get an overview of the different package managers for Python, I created a table with the most important commands and differences. It is not complete (brew, venv and virtualenv are missing), but it should give you a good overview of the most important package managers and their commands. You can see the table here.

Visual Studio Code

I use VSCode as my main editor and installed some extensions that I find useful, but I`ll describe this in a separate post.

Zoom

Well, this is called ML_Zoomcamp, so I installed Zoom for the course, although it is not needed for the course itself. But as suggested in the course, there are slack channels and a telegram group for communication. So I installed Slack and Telegram as well.

brew install slack telegram

Ubuntu 24.04 on AWS EC2

Also I created an Ubuntu 24.04 x86_64 instance on AWS EC2 with a conda environment for the course and setup port forwarding to access Jupyter server with VSCode (Remote-SSH).

The video suggested installing Ubuntu 22.04, but I prefer to use the latest LTS version.

Hopefully I will not run into problems with running code locally on Apple Silicon (arm64) and remotely on x86_64. But as pointed out in the video, I can create an arm64 instance on AWS as well.

I’m wondering if there is a more convenient way to connect to the Jupyter server without copying the IP address manually into the .ssh/config. If you have any simple suggestion please let me know.

I try to create an .dotfile-repository with all my configurations and settings, but I'm not sure if I find the time to collect all nessesary configuration files. Can you recommend tools like stow or do you have any better suggestions? How do you manage your configuration files efficiently to sync them between different machines?

Other Cloud Environments

Last I have accounts for kaggle and Google Colab for running notebooks in the cloud.

With all these environments I should be well prepared for the course. For me the most exiting part was creating my first EC2 instance on AWS (it worked without a credit card here in germany).

--

--

Till Meineke

👨🏼‍💻 Data Scientist / Data Analyst; excited about: 🧠 ML 🛠️ Data Mining 🎮 XR/MR-experiences 🧬 Life-Science