Getting started with Anaconda & Docker

In the following post, I’d like to document how to setup Anaconda and Docker to quickly reproduce data science environments across different platforms and collaborators.

Big picture

Anaconda with its sandboxed environment for scientific python packages and Docker’s containerization technology make a great combination for scalable, reproducible and portable data science deployments.

You can use Anaconda with Docker to build, containerize and share your data science applications with your team. Collaborative data science workflows with Anaconda and Docker make the transition from development to deployment as easy as possible.

Finally, containerizing the data science application enables the use of container clustering systems such as Kubernetes to scale the application in production.

Basic workflow

  1. Choose one of the Anaconda images based on your project requirements
  2. Pull the chosen image
$ docker pull continuumio/anaconda3

3. Run the image and start an interactive shell

$ docker run -i -t continuumio/anaconda3 /bin/bash

4. Once the Docker container is running, we can start an interactive Python shell, install additional conda packages or run Python applications.

Example: Jupyter Notebook

  1. Start Jupyter Notebook server with Anaconda from a Docker image
$ docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c “/opt/conda/bin/conda install jupyter -y — quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook — notebook-dir=/opt/notebooks — ip=’*’ — port=8888 — no-browser”

2. Open http://localhost:8888 to view Jupyter notebook

Recap: Docker commands