Django, Jupyter Notebooks and AWS RDS

Johannes Reichard
3 min readJun 19, 2016

--

I had the challenge to analyse some of the data in our Database and I needed something more powerful than plain MySQL and wanted the ability to show data in graphs: Jupyter Notebooks to the rescue.

What are Jupyter Notebooks

Jupyter notebooks provide in interactive shell in a browser with the capability to display inline results and images (some examples). Notebooks have many awesome features like you can generate and display graphs directly in the Notebook, use mighty libraries like matplotlib, pandas, scikit-learn and export to html or PDF with the results.

Install Jupyter Notebooks and django-extensions

To integrate the notebooks with django the easiest way is to use django-extensions. The django extensions provide a feature called ‘shell_plus’ that hands you an interactive shell which imports your project settings. With ipython installed, shell_plus can also start the Jupyter Notebooks Server. Here are the needed packages:

# requirements.txt
django-extensions
ipython[notebook]
# needed for calculating and nice graphs
matplotlib
pandas
seaborn

Configure Jupyter

I wanted our Notebooks to live in the same repo like the source of our Backend because it’s tied quite closely together. Here the Settings for Django:

# django settings - jupyter notebooks
NOTEBOOK_ARGUMENTS = [
'--ip=0.0.0.0', # reach notebooks from outside
'--port=8888', # std port
'--no-browser', # don't start browser on start
# directory in the vagrant box to the notebooks
'--notebook-dir', '/home/vagrant/project_repo/notebooks'
]

Access data

Now we can analyse our data with the full magic of the Django- ORM. But we don’t want to analyse the local test data so how to access real data?

  1. Load dump of live data - takes to long
  2. Access live db - always bad idea
  3. Create new db from last backup and access this as nearly live data

We choose option 3 because it it the easiest option for us. We use a MySQL DB in AWS RDS. So creating a new DB with the last snapshot is basically install and configure aws-cli and restore a snapshot to a new RDS instance:

# create new rds-instance
aws rds restore-db-instance-from-db-snapshot --db-instance-identifier analytics-db --db-snapshot-identifier latest-snapshot

Start Jupyter notebooks

There is only one more obstacle to overcome. The notebooks and the project root are not the same. Depending on which directory you start the notebook server you can either find your notebooks or you can import and use your Django project. To fix this set the PYTHONPATH to the project root while starting the notebook, this ties everything together. Start the notebooks with this command:

PYTHONPATH=/home/vagrant/project_repo/src \
DJANGO_SETTINGS_MODULE=project.settings.local_analytics \
python manage.py shell_plus — notebook

To ease everything I added a function to my .zshrc which I can call from the repo directories. This command logs in the vagrant box (local test system) and executes the management command to start shell_plus with the notebooks extension:

# for own usage: replace project with project path
function start-notebook {
vagrant ssh -c "source ~/.virtualenvs/project/bin/activate; \
cd /home/vagrant/project/src; \
PYTHONPATH=/home/vagrant/project/src \
DJANGO_SETTINGS_MODULE=project.settings.local_jr \
python manage.py shell_plus --notebook"
}

To access your Jupyter notebooks open localhost:8888/ in the browser of your choice.

Summary

That’s basically it, now we can:

  • Start a new DB Instance with (nearly) live data
  • Connect to the DB
  • Start a Jupyter Notebook
  • Analysing our data with the help of our beloved Django Models

--

--

Johannes Reichard

CTO and Backend Developer @truffls - in need of a place to store my findings / trying to give something back