Shabaz Patel
Datmo
Published in
4 min readFeb 1, 2019

--

This blog was originally posted on the Datmo blog on July 29, 2018

Setting up your Data Science and AI dev environment in 5 minutes

Whether you’re a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle.

While containerization tools like Docker have truly revolutionized reproducibility in software, they haven’t quite caught on yet in the data science and AI communities, and for good reason! With constantly evolving machine learning frameworks and algorithms, it can be tough to find time to dedicate towards learning another developer tool, especially one that isn’t directly linked to the model building process.

In this blog post, I’m going to show you how you can use one simple python package to setup your environment for any of the popular data science and AI frameworks, using just a few simple steps. Datmo leverages Docker under the hood and streamlines the process to help you get running quickly and easily, without the steep learning curve.

Using datmo to get a new TensorFlow project setup in under a minute

0. Prerequisites

* Install and launch Docker

* (If using GPU) Install CUDA 9.0

* (If using GPU) Install nvidia-docker (Step 3)

1. Install datmo

Just like any python package, we can install datmo from your terminal with the following:

$ pip install datmo

2. Initialize a datmo project

In your terminal, cd to the folder you want to start building models in. Then, enter the following command:

$ datmo init

You’ll then be asked for a name and description for your project — feel free to name it whatever you’d like!

3. Start environment setup

After a name and description, datmo will ask if you’d like to setup your environment — type `y` and press enter.

4. Select System Drivers (CPU or GPU)

The CLI will then ask which system drivers you’d like for your environment. If you don’t plan on using a GPU, choose cpu.

(1) gpu
(2) cpu

Please select one of the above environment type (e.g. 1 or gpu):

5. Select an environment

Next, you’ll choose from one of the many pre-packaged environments. Simply respond in the prompt with the number or ID of the environment you want to use.

(1) data-analytics : has libraries such as xgboost, lightgbm, sklearn etc.
(2) mxnet : has libraries for mxnet(v1.1.0) along with sklearn, opencv etc.
(3) caffe2 : has libraries for caffe2(v0.8.0) along with sklearn, opencv etc.
(4) keras-tensorflow : has libraries for keras(v2.1.6) and tensorflow(v1.9.0) along with sklearn, opencv etc.
(5) kaggle : has the environment provided by kaggle
(6) pytorch : has libraries for pytorch(v0.4.0) along with sklearn, opencv etc.
(7) python-base : has base python image with no libraries installed
(8) r-base : has base R image with no libraries installed. Use this environment for rstudio workspace
Please select one of the above environments (e.g. 1 or data-analytics):

6. Select a language version (if applicable)

Many of the environments above have different versions depending on which language and version you plan on using.

For example, after selecting the keras-tensorflow environment, I’d be faced with the following prompt asking whether I want to use Python 2.7 or Python 3.5.

(1) py27
(2) py35

Please select one of the above environment language (e.g. py27):

7. Launch your workspace

You’ve properly selected your environment, now it’s time to launch your workspace. Choose the workspace you’d like to use, and enter its respective command in your terminal.

Jupyter Notebook —

$ datmo notebook

JupyterLab —

$ datmo jupyterlab

RStudio —

$ datmo rstudio (available in R-base environment)

Terminal —

$ datmo terminal
Opening a Jupyter Notebook and importing TensorFlow

You’re set! The first time you initialize a workspace for a new environment, it will take a bit of time as it needs to fetch all of the resources, but it will be significantly faster in consecutive runs.

Once your workspace launches, you’re good to start importing packages and frameworks that were included in the environment you chose! For example, if the user selected the keras-tensorflow environment, then import tensorflow will work out of the box in your Jupyter Notebook!

If you’re using TensorFlow, you can try this example from our docs for running your first TensorFlow graph.

If you’d like to help contribute, report issues, or request features, you can find us on GitHub here!

--

--

Shabaz Patel
Datmo
Editor for

Co-Founder at Datmo | Deployed CV and NLP models | Studied at Stanford University