Docker Image for Data Scientist: A Painkiller to Challenges from Environment Management

Yu Yang, Ph.D., PE, PMP
3 min readJun 2, 2024
Conda Crashed (Image Generated from DALL.E3)

Conda used to be my first choice when I entered the field of data science and machine learning. For data science or machine learning tasks, it is common to install multiple packages such as Numpy and Pandas. However, since the beginning of last year, Conda environment has crashed on my laptop, and I have gone through cleaning Conda packages and reinstalling it for a few times. Eventually, I started running Jupyter Notebook with a Docker container/image, instead of using Conda.

There are numerous benefits of running Jupyter Notebook or other tasks using a Docker container. Here I just shared four of them upon my personal experiences using Jupyter Notebook and Python packages.

  1. Isolation of the Environment

Since Docker containers encapsulate the environment, it ensures that the Jupyter Notebook runs in a consistent environment regardless of where it is deployed. This isolation prevents conflicts among dependencies. For instance, if you find there is a conflict between Python packages, you can easily pinpoint the problem. You would be able to get rid of the troublesome package from the image easily.

Docker images contain all the necessary dependencies and configurations. Thus, users can start with a fully functional…

--

--

Yu Yang, Ph.D., PE, PMP

data scientist, water engineer, real estate investor. Google scholar citations: 3,160. X(Twitter): @Yang_ML_Estate