Setting up your Data Science environment using docker and python 3

Download a docker image that has most of the tools that you will need preinstalled.
My personal favourite is dataquestio/python3-starter

It contains all necessary data science libraries like scikit-learn, ntlk, pandas and also jupyter notebook, which you will shortly see is an invaluable tool, while working with your data.

docker pull dataquestio/python3-starter
Using default tag: latest
latest: Pulling from dataquestio/python3-starter
9943fffae777: Pull complete …

Make a folder to store your notebooks outside of the docker container, you will mount this folder into docker container and it will be accessible both from your host and from inside the container.

I made mine in /Users/Max/dev/python/notebooks, feel free to make yours wherever you like.

Create and run the container from the image you just downloaded.

docker run — name data-science -d -p 8888:8888 -v /Users/Max/dev/python/notebooks:/home/ds/notebooks — link mysql_local:mysql — link mongo_local:mongo dataquestio/python3-starter

If your data is in csv format you might drop it in the notebooks folder you created above or copy it to the location inside the container via

cp /Users/Max/dev/data/analytics.csv data-science:/home/ds/notebooks/data

Then you can simply load it into pandas dataframe

import pandas as pd
import numpy as np
df = pd.read_csv(“download_data.csv”)

## to see the first 10 lines of the data you just loaded
#### When you need Mysql support

Install mysql connector to be able to access your mysql data in pandas

Check the ubuntu version in the docker container.

Connect to the container

docker exec -i -t data-science /bin/bash

do cat /etc/lsb_release


Get the connector here


then sudo -s to switch to root

and dpkg -i mysql-connector-python_2.1.3–1ubuntu14.04_all.deb

To install it

Hope this helps, if you have you improvements or suggestions please leave a comment below :)

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.