Docker image for Machine Learning and Data Science

The last few months at Report Bee has been really exciting. I was part of a team that worked on interesting problems in Computer Vision. We used OpenCV, TensorFlow and its object detection library. The work environment set up involved challenges in installation of the OpenCV libraries along side Python 3. We decided to use Python 3 over Python 2 because we had built other applications using Python 3 and Django 2 and wanted to stick with Python 3 for the rest of the projects too. Another major time consumer in the work environment set up included the installation of TensorFlow Object Detection libraries along with its dependencies.

We used Object detection in one of our sub problems and training the model in certain cases took more than 2 days. We were trying out with some pre-trained models (RCNN, Faster RCNN, Mobile net…) and compared how the model performance and accuracy changed, but doing the training in our dev systems took a lot of time.

In order to reduce the training time and run model training in parallel, we had setup a server machine with really high spec. Again we had to go through the pain of installing all the necessary tools for making it work but this time we decided to automate the environment setup.

Luckily we had explored Ansible, Docker and other tools in the dev-ops ecosystem recently. We used Docker to build an image that would help our teams bring up their environment in under 10 minutes. If you are unfamiliar with Docker check this out.

The docker image basically has the tools and packages that we use internally for our purposes. It would save a lot of time for people who want to get started right away especially for bootcamps and experimentation purposes.

The image is built on top of python 3.6 (docker image) and has these packages built into it.

The following ports have been exposed

How to setup?

  1. Install Docker Engine CE (https://www.docker.com/community-edition) — Download your OS specific package
  2. Pull the image from docker hub docker pull reportbee/datascience:latest
  3. Run the image using this commanddocker run --rm -d -p 6006:6006 -p 8888:8888 reportbee/datascience:latest
  4. Open your browser and goto localhost:8888 and use the password reportbee to login to the jupyter notebook

If you are facing any issues with the current image or you want any other package to be installed please raise it here. You can also choose to build your own, we have put the Dockerfile on Github.

Note on Object Detection using Tensorflow

If you are trying to setup object detection, the jupyter notebook has the models folder cloned from repo (https://github.com/tensorflow/models/tree/master/research/object_detection). You can navigate to the object detection folder (models/research/object_detection) and directly execute the object_detection_tutorial.ipynb notebook. We are planning to write a few python scripts as a part of this image to do object detection for custom objects. You can get the updates from the GitHub repo.