The Full Stack Data Scientist Part 1: Productionise Your Models with Django APIs
How to build an API endpoint for a model in Django
We all know that in data science, building and tuning models is only a small part of a much bigger story. What about after you’ve built the perfect model? How do we make it accessible to other parts of the business, or the web? For a data scientist to be ‘full stack’, we should be able to surface our results through APIs. This means being able to create your own endpoint that takes rows of unseen data and returns the predictions of your model. This will make your insights easier for other applications to use.
Using the popular open-source web development framework package Django, we can take a model and turn it into an API. I’ll be showing how you can do this for the classic Kaggle Titanic: Machine Learning from Disaster challenge. If you would like the ready-made codebase, check out the Github repo.
Step 1: Create the Environment
Start a new local repository and open a terminal, navigating to the new folder’s directory. Download the requirements file from my repository and put it in the directory. We all have our own way of handling Python environments, but I like to use miniconda so I run the following:
conda create -n titanic-djangoconda activate titanic-djangoconda install pippip install -r requirements.txt
Step 2: Start a New Django Project
With the titanic-django environment ready, we can now use django-admin commands to initialise the file structure of the project. This command allows you to perform a variety of Django related tasks from the command line, usually to initialise projects or manage local versions of the database. Run
django-admin startproject titanicapi
This will create some boiler plate files and directories, making your file structure look like this:
Navigate to the project root (first level of titanicapi) and run
python manage.py migrate
This is just another part of our setup process. All this does is updates the local version of our database (db.sqlite3) to contain the core tables initiated by our startproject command. On a successful migration, you will see this
Step 3: Create an API App
In the project root directory, run
django-admin startapp api
This will create another subdirectory called api, with its own set of initial files. Your new file structure should look something like this:
Whenever we add a new app to our Django project, we have to add it to our list of installed apps in the main python package. So head to titanicapi/settings.py and add ‘api’ to the list of installed apps.
We need to add a couple more files to our api app subdirectory to support the functionality of the api. First create functions.py:
This file contains the type of code data scientists are familiar with; a function to load a model pickle file and a function which reads data and classifies a passenger given a model. Still in the api app directory, we need to create urls.py:
This tells the app that upon any requests to /api/classification/, call the function get_classification.as_view(). We haven’t defined the get_classification view yet, but this describes how we will consume the request’s data and return the predictions. Edit the views.py file inside the api directory to this:
The post method of the get_classification class tells our API how to handle a post request. It takes the data from the request and runs it through the classification algorithm saved by titanic_model.pk. Then, we wrap our output up in a Response and return it. You will need to have a copy of titanic_model.pk saved in the api subdirectory, I recommend to use the one from my repository.
It’s nearly ready, there’s one final bit of admin we have to get through. The main python package isn’t aware of the app’s registered urls until you have added them to its urls.py file. Edit the titanicapi/urls.py file as follows:
Your API is now ready to run!
Step 4: Make Requests
You’ve got all the functionality in place, now it’s time to run the server on your local machine and try querying the API with some test data. To start up the local development server, run this from your project root directory:
python manage.py runserver
You should see something like this:
Leave this running and load up another python environment (e.g. jupyter) with the packages requests and pandas installed. Also, make sure you can easily read in the titanic test dataset. Here’s a little test script I wrote to check the API’s functionality:
You should then see the extra column ‘Survived’ appended to the test file, containing the model’s predictions for the passengers.
To give a better idea of the architecture we just designed and how the requests are handled, take a look at the diagram below.
When we send a post request to the server containing the test data as its payload, Django first made sure that the url of the request is registered in the urls.py file. That’s why we had to include the api’s urls, otherwise the service wouldn’t be aware of any api/classification url. Then the request is forwarded to the view specified by the url to carry out the functionality of the api, i.e. the get_classification view. This loads the model and predicts, returning the predictions as a response. So our server knows that when the api/classification url is hit with a post request, run the get_classification view to make predictions, and return the predictions in the response. Since you can define the views and urls as you please, there is great flexibility to have your APIs do whatever you want. An example extension to our API would be to run functions to ensure clean, correctly formatted request data.
That’s it for this post. I hope this gives any you the confidence to get started building your own APIs. Check out the series’ introductory post, where you can vote for the next topic!
Applied Data Science is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website.