Use Docker to Simplify Data Science Development Environments

How to Simplify Data Science Development Environments in Windows by using Docker on Windows Subsystem for Linux 2.

Sung Kim
Analytics Vidhya

--

Problems with Python Environments

Python and its extensive library of packages provide an amazing array of libraries and applications covering every use case imaginable for your data science and machine learning development workflow.

Like everyone else, you will most likely be using Anaconda distribution of Python, which packages Python with most commonly used packages for your data science and machine learning development workflow. To manage its extensive library of packages — over 7,500 data science and machine learning packages at last count, it comes with Conda package manager that automates the process of installing, updating, and removing packages.

When installing a new Python package, Conda will first resolve the dependencies, check if they are already installed on the system, and, if not, install them. Once all dependencies have been satisfied, which may require new installation of packages, upgrades of existing packages or downgrades of existing packages; then it will proceed to install the requested package(s). This all happens globally, by default, installing everything onto the machine in a single, operating system-dependent location.

--

--

Sung Kim
Analytics Vidhya

A business analyst at heart who dabbles in ai engineering, machine learning, data science, and data engineering. threads: @sung.kim.mw