The Full Stack Data Scientist Part 4: Building front-ends in Streamlit

A guide to making your data and machine learning models available to the general public

Daniel Sharp
Applied Data Science
3 min readDec 4, 2019

--

In the previous posts for ‘The Full Stack Data Scientist” we’ve emphasised the importance of productionising Machine Learning models and making them available to the business, be it a technical or non-technical audience. In this blog post we’ll take a look at Streamlit, a fantastic open source tool which allows, through a few lines of code, to publish an interface for data and/or interacting with models.

For this post, I built a simple dashboard which allows the user to explore criminal activity in their local area, using data from the Police API. For more info on how to build your own APIs for your models, check out our previous post on using Django here.

Feel free to explore the dashboard here and the codebase here. My objective in this post was not to make the most comprehensive dashboard, but to try Streamlit and see how far I could get in a few hours.

Interacting with the Crime Visualisation Dashboard

Streamlit has been built in such a way that very simple commands can take you a long way. Their documentation is great at explaining the different widgets and options, so I won’t go into great depth. Some examples below:

Dropdown to select option

This code will place a dropdown select on the main section of the screen, however, if you want to place it on the sidebar is as easy as adding the word sidebar between ‘st.’ and ‘selectbox’.

Adding plots and maps

Streamlit is really well integrated into mapping and graphing tools such as Altair, DeckGL and Vega Lite. These have really user friendly APIs that allow to build plots from Pandas DataFrames easily:

  • Vega Lite Bar Plot

Which results in:

Phone snatching making trends making an appearance here in the ‘Theft from the person’ category, which is the most common type of crime in the Old Street area.

  • Multiselect and DeckGL map

Which results in:

I did find some issues with its integration with DeckGL, such as the lack of a tooltip and the inability to re-center the map every time the postcode was changed. However, it seems the dev team is working on integrating Streamlit with PyDeck, which should provide better compatibility.

I should also talk about Streamlit’s cache decorator, which avoids reloading data unnecessarily every time an option is changed. It looks like this:

Using ‘@st.cache’ will prevent a function from running if none of its parameters have changed, avoiding unnecessary reloading of data. In this case, the fetch_data function will only be re-run when either a new postcode is put in or a different date is selected. This allows for a much faster execution of the code and filters once the data has been loaded.

Streamlit is a great open source tool to quickly make your models and/or data accessible to a more general audience. Check out their website for additional documentation and projects similar to this.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website.

--

--