My Experience Deploying an App With Streamlit Sharing
Build a multi-functional app with good memory management and avoid memory leak
This is not a tutorial of how to use streamlit, you can find great tutorials in the streamlit docs and medium articles. This is the summary of my trial and error deploying an app and debugging the most frequent bugs with streamlit sharing.
For those who don’t know streamlit, it’s an open-source app framework for data scientists to deploy their data apps in a few lines of code. The use of streamlit is very easy. Here, we are talking about building the app, because deploying the app is another story. It’s as they call it: Ridiculously Easy.
Now, let’s dive into the main topic: How to debug the most frequent bugs you face while building and deploying an app with streamlit?
The tips I will share might be a piece of cake for people coming from computer science background. However, for data scientists that mostly work with jupyter notebooks, who are the potential streamlit users, this article might be useful because it highlights the importance of object oriented programming in building a memory-friendly clean pipeline.
My Web App:
The app predicts diseases in cassava plants: Web App
The gif above shows the functionality of my App: You upload/select a Cassava plant image and a deep learning model predicts the disease in the plant image with a detailed report.
The idea behind the app: I participated in a kaggle competition: Cassava leaf disease prediction competition where we had to train models to predict diseases in cassava plants. I had the model and the weights ready, so I decided to create my first streamlit App.
If you’d like to learn more about the training process of the model, please check my article:
Demo App option
For inspiration, I checked many apps in the streamlit app gallery, and realized that as a user, it’s very important to provide the user with a demo option to test the app.
Many apps are great but they require the user to upload a file for the app to run, most users would just skip. However, if you provide the user with a demo option, then there is a solid chance that user would test your app and discover all the features it offers.
An example of the demo version of my app. I provide the user with 3 built-in images to select:
Adding a demo to your app can increase your memory consumption, that’s why it’s important to learn how to efficiently manage your RAM to avoid memory leaks.
Memory management: Make the most out of 800MB of RAM
With streamlit sharing, Apps get up to 1 CPU, 800 MB of RAM, and 800 MB. This is a generous compute power compared to Heroku’s free tier with only 512 RAM. The memory is the variable we want to optimize, especially for multi-functional apps. We want our app to have as many features as possible with the minimum memory consumption.
You can see in figure 1 that my App has many elements that might consume a lot of memory:
- Many dependencies: OpenCV, matplotlib and albumentations for image preprocessing, Pytorch and torchvision for modeling…
- Images: Image data occupy more space in the RAM than text or tabular data.
- Deep learning model: Loading the training weights (size 100MB).
- Many features: (1) select/upload an image (2) display the image (3) print the prediction result (4) display the grad-cam image (5) print the class predictions table. Every executed feature will lead to an object stored in the memory.
The ideal memory usage should be something like the figure below.
As your app boots up, it consumes a base amount of RAM when installing the dependencies you cite in the requirements.txt file that are required for your app to run.
The rapid increase in the exponential phase is followed by a plateau phase where your app is running correctly hovering around a stable point without increasing its memory consumption.
The difference between your RAM limit (800MB) and the plateau phase is important. The lower it plateaus, the more users your app can handle at the same time.
More information about debugging memory leak.
TIP N°1: Don’t install unnecessary libraries
If you are a jupyter notebook user, you probably always start your work by importing dozens of libraries, sometimes you don’t use most of them since you have 15GB of RAM in a colab notebook and keeping them won’t hurt. In case you deploy an app with streamlit you only have 800MB of RAM.
Here are 2 examples of how I reduced the number of libraries:
- Avoid ‘repetitive’ libraries: I was using both Pillow and OpenCV libraries for image preprocessing. You can perfectly do the job with one of them. So I had to convert all the Pillow codes and stick to OpenCV.
- Avoid ‘intermediary’ libraries: I was using Timm library to import my pytorch model. Well, Timm library itself imports the model from the Torchvision library. So, I could perfectly drop the Timm library and import directly from Torchvision.
TIP N°2: Avoid memory leak
The mistake that I was making, and that most jupyter notebook users would make, is to store the same data in many variables throughout the notebook, which is translated to an unnecessary memory consumption. The ideal thing to do is to wrap all your variables inside of python functions or classes to make sure each individual variable is read and stored in the memory only once.
This applies for a single use, however, our app is meant to be used multiple times by multiple users. Every time a user uses the app, your python script is run in the backend and your python objects are stored in the memory.
I realized that after sending my app to some users to test it, the app crashes before the 4th user could open it.
As I mentioned earlier, my app uses a heavy deep learning model, the model’s weights alone consume 100MB of memory. So storing the model in memory repeatedly made my app crash after 3 trials.
The solution to this is very easy, wrap all your variables inside of a function or a class. After the execution of the code, delete every single variable (images, data-frames, models, weights…).
You can check the code in my app.py script. All the variables are wrapped in the deploy() function. After the execution of the function, all the variables are deleted with the del keyword.
If you don’t free your memory when you’re done using it, it can result in a memory leak. Sometimes deleting the used variables is not enough, especially if your variable was copied from a class or a function. This might affect your App at the long term.
To make sure all the used data has been freed from the memory, use garbage collection module in python. Running a garbage collection process cleans up a huge amount of unused objects from the memory.
#Import the garbage collection module
Import gc#Enable garbage collection
gc.enable()#Clean up the memory from unused objects
To learn more about garbage collection, I recommend reading this article: Python Garbage Collection: What It Is and How It Works.
I share in this article some basic tips to improve the memory management of Web Apps. This is my first app, so any advice or tips on how to improve the performance of the app is more than welcome. You can find the code in my GitHub repository, feel free to contribtue to the project with pull requests.