5 steps to build a Data Web App MVP with Python and Streamlit

Pierre-Louis Danieau
The Streamlit Teacher
6 min readOct 24, 2022

🤯 Introduction

Have you ever felt the frustration that after you create a great Machine Learning model on your Jupyter Notebook, nothing happens ?

Wouldn’t it have been great to be able to deploy it on the web and share it with everyone in order to confront it with real life ? 🤔

Photo by Milad Fakurian on Unsplash

When a Data Scientist or any other person interested in data realizes a personal project, it often happens that he wants to share his work with the whole community. Especially if this person wants to create an MVP (Minimum Viable Product) to test the viability of his idea.

However, deploy a data website into production is not always easy because it requires skills in Data Engineering (Cloud, Docker…) and Web Development (HTML, CSS, JavaScript…) that not everyone masters.

I present here a simple solution that allows you to deploy your Data project via a web application with only the knowledge of Python and the open-source framework Streamlit (which I will present).

In order to illustrate my remarks with a concrete case, I chose to present a tutorial in order to develop an application to search for trains in real time thanks to the API developed by the SNCF (French railway company).

Final Github repository of the web app available here.

Final version of the web app for this tutorial

Let’s go for the tutorial !

Part 1: Finding an API

The first thing to develop a Data application is obviously data. So yes you can simply download an excel file and call it from your program but when you have to deal with data that changes regularly, it is more efficient to call an API. An API is simply a server from which it is possible to request data from a program.

Millions of APIs exist and companies such as Reddit, Facebook or Twitter offer to access some of their data via an API.

In the case of this tutorial, we are going to use the API of the SNCF which shares all the trains available for the TGV max subscription (subscription of 79 euros/month offered to young people which allows to book for free an unlimited number of eligible trains under conditions of low traffic).

Here is an example of the data returned by the API in json format :

Data format returned by the SNCF API

For each train we find in particular the fields:

  • date, departure_time, arrival_time, origin, destination.

These 5 fields will be useful to display the trains according to the user inputs.

Part 2: Creating your programming environment

Open a terminal at the location where you want to place your project and execute the following command lines to create a folder, create a virtual environment and activate it, then install the dependencies of the libraries we are going to use.

  1. Create your directory
mkdir project_data

2. Create your virtual environment

python3 -m venv env

3. Activate your virtual environment

source env/bin/activate

4. Pip update

pip install --upgrade pip

5. Create the file requirements.txt following with the 3 dependencies we will use.

6. Install the dependency file

pip install -r requirements.txt

It’s all good !

Our programming environment is installed, we can code !

Part 3: How to query the data from the API

We access the API from the url that the SNCF website provides us. Here is the url that we will use :

https://ressources.data.sncf.com/api/records/1.0/search/?dataset=tgvmax&q=&rows=10000&sort=-date&facet=date&facet=origine&facet=destination&facet=od_happy_card

However, you notice that the above url has the fields :

  • date
  • origine
  • destination

This allows to pass in parameters the date, the place of departure and the place of arrival in order to request only the trains which interest us as with this example for all the trains of 06–15–2022 from Paris to Lyon :

https://ressources.data.sncf.com/api/records/1.0/search/?dataset=tgvmax&q=&rows=10000&sort=-date&facet=date&facet=origine&facet=destination&facet=od_happy_card&refine.date=2022%2F06%2F15&refine.origine=PARIS+(intramuros)&refine.destination=LYON+(gares+intramuros)

This requires upstream to know all the cities of origin and destination. That’s why I made this research beforehand for which I saved 2 csv files in which you will find :

  • Departure cities: here
  • Destination cities: here

Download and save these 2 files in your main directory.

Part 4: Coding the application with Streamlit

First of all, you have to create an app.py file which will be the main file of our application.

Let’s start by importing the necessary dependencies:

Now we can create our 2 main functions:

  • The param function: It collects the user’s information, i.e. his departure location, his arrival location and the desired departure date. This information will be used to request the API to retrieve the desired trains.
  • The request function: Function that requests the API from the values returned by the param function. This allows to query only the trains that interest the user.

Param function :

In this function, we call the st object which refers to the streamlit framework. This allows us to define widgets, elements that allow interaction with the user such as: a sidebar, single or multiple choice buttons or the ability to select a date from a calendar. More information in the documentation.

param function into the app.py file

Request function :

In this function we call the API of the SNCF according to the parameters that the user has defined, namely: the place of departure, the place of destination and the date to leave. The request function returns a DataFrame pandas with all the trains returned by the API, that is to say, those that are eligible for the TGV Max subscription.

request function into the app.py file

Main Program

Let’s put it all in order!

First of all, download this png image used in the layout of the application and insert it in your folder :

Once these two functions are added to app.py, all you have to do is insert them into a main.

In the main program, we use some out of the box functions of streamlit such as set_page_config or columns or write or markdown to build the layout of our application.

main

That’s it, the program is finished, just execute app.py with streamlit :

streamlit run app.py

You will then see a local web page appear on your browser like this:

Note the ‘localhost’ into the url

Part 5: Deploying the site

Now that the application is running locally, we need to deploy it so that it can be accessed from a public url.

For this we will use streamlit services which will take care of the whole production of the site.

For more information, you can refer to the streamlit documentation: here

Here are the steps to follow:

  1. Create a github repository with :
  • requirements.txt
  • app.py
  • departure_city.csv
  • arrival_city.csv
  • train.jpg
  • README.md (optionnal)
Architecture of the github repository

2. Create an account on Streamlit thanks to your Github profile so that Streamlit has access to your application directory.

3. Click on “New app” and fill in your directory name, your branch name and your file nameapp.py . Then click on “deploy”.

4. After a few minutes, Streamlit has deployed your application which is now accessible via a public url, congratulations ! 🎉

Conclusion

We can now develop Data applications with only the knowledge of the Python language and the Streamlit framework in only a few hours!

In addition, Streamlit allows you to style the application by adding photos, colors and the integration of several pages. This is a great way to showcase your Data projects or build an MVP very quickly.

Some useful link :

🎯 If you want to go further :

  • I have created a 5 hours training about Streamlit where I present all the steps to build a Data / ML Web Application. → Link to the Udemy course

Pierre-Louis Danieau

PS : If you like this tutorial, please don’t forget to upvote it, many thanks! 👍

--

--