Influenza Estimator — Backend

Building the back-end for the Influenza estimator web tool using the Python Flask framework.

Tej Sukhatme
3 min readJun 30, 2020
Really couldn’t resist adding this illustration :P

Finally, the model is ready and we have come to the final step of building the actual web tool. This is going to be a really simple back-end without any complicated logic and so it makes the most sense to pick the Python Flask framework.

Our back-end should do the following things:

  • Query the data related to the Wikipedia pageviews.
  • Store the data in a database.
  • Train the machine learning model on the older data as soon as the server is started.
  • Run the machine learning model on the real-time data.
  • Render the findings on the screen when the website is loaded.
  • Return the findings when requests are made to the server using a REST API.

Wikipedia Queries

For simplifying this process we will be using PageViewAPI, a python API which does the querying for us.

I set up a simple wrapper class which did all the querying for me. This is the basic statement that queries the data:

I had to make sure I handled the ZeroOrDataNotLoadedException and the ThrottlingException aptly.(Here I have set the number of pageviews for that article as zero if these exceptions are encountered) Although the actual endpoint returns the data as a JSON file, the PageViewAPI gives us a nice little dictionary. All you have to do is get the data associated with the ‘views’ key.

One thing to note here is the project. This differs from country to country on the basis of the language spoken there. Austria and Germany speak German so the project is de.wikipedia. Belgium and the Netherlands speak Dutch so it is nl.wikipedia. Lastly, the project in Italy is it.wikipedia.

Storing the information

For this, I used simple ‘Comma Separated Values (CSV)’ files. This was done so that I could directly use Pandas to load the data and run the algorithms.

I made two 6 different CSV files, one for each country to store all the older data and one for storing the real-time estimates.

I also wrote a DataGateway class which will be doing all of fetching of the data for us. The functions in this class are:

  • get_incidence()
  • query()
  • get_live_data()
  • get_old_data()

get_incidence() will simply take all the different constraints like countries, week, year etc as parameters and return the correct estimates after calling the query function.

The query() function will direct the program control to either get_live_data() or get_old_data() depending on which data is queried.

get_live_data() basically calls the function of the WikiGateway class we defined earlier to send requests and retrieve data of the previous week. Following this, the model is applied to the data and the results are returned.

get_old_data() checks for the data in the database and returns it. If the estimate for that particular week has been calculated before, it does not run the model again, else it applies the trained model to the data for that week. Following this, the model is applied to the data and the results are returned.

This in a nutshell is what the DataGateway class does.

Machine learning model

We are training and applying the machine learning model in the flask back-end itself.

For this purpose I made a class called ModelGateway with the following functions:

  • predict(
  • clean_vector()
  • process_vector()
  • hot_encode_weeks()

The role of this class is to encompass all the functionality we will ever need in relation to the Machine learning model. Firstly when we start the server, we create an instance of the ModelGateway class which trains the Random Forest Regression model on the data.

Every time we query a value, the ModelGateway will clean the vector, process it, and finally apply the machine learning model to it to produce the result. This result is then returned by the predict() method.

This is how the Flask server has been set up.

--

--