Create a Beautiful App Using Streamlit

Published in

LatinXinAI

6 min readSep 17, 2022

If you want to take your ML projects to another level, where different users can use your ML algorithm in a web application, capable of generating predictions in real-time, stay in this article I will show you how.

Streamlit is a Python library specialized in data work, where you can create web applications in a simple way, as well as upload your app to a server so that more people can use your application.

ML Proyect

Note: I will share in a summarized way the most relevant events of my project.

About Problem

Generate a predictive model, in order to estimate the price of pre-owned and used cars based on the UK market.

Manufacturer: Vehicle manufacturers of the respective brands mentioned above.
Model: Vehicle model.
Year: Model year of manufacture.
Mileage: Number of miles traveled by the vehicle.
EngineSize: Car’s Engine Size.

Initial Libraries

Data Preprocessing

After doing an EDA and data cleaning, it is time to make a transformation to the categorical variables, in this case they are nominal variables, which applies a transformation called One Hot Encoding.

Carry out the transformation with OHE in this way, in order to facilitate the processing for new predictions.

Split Data

We split the training and validation data and we perform the transformation of the data to numpy arrays, in order to speed up the model training process.

Model Creation

Explanation of Parameters:

max_depth: Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.
learning_rate: Step size shrinkage used in update to prevent overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.
subsample: Subsample ratio of the training instances.
colsample_bytree: colsample_bytree is the subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
colsample_bynode: colsample_bynode is the subsample ratio of columns for each node (split) subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.
gamma: Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be range. Gamma values around 20 are extremely high, and should be used only when you are using high depth.
n_estimators: Number of trees.
reg_alpha: L1 regularization term on weights. Increasing this value will make the model more conservative.

Create a function in order to extract the most relevant features determined by the project, to later use them to create a graph.

Plot Importance Features

If the vehicle has a Manual transmission, the price of the vehicle generally decreases, it is less expensive with an automatic or semi-automatic transmission.
The engine size is a variable that has a lot of weight. Since the greater the capacity, the greater the technical capacity of the car.
The year of manufacture includes in the price, since its a vehicle of the same model. It will make the price of the car more expensive, since it will be a more recent model.
Other variables that complement the predictions well is the vehicle brand, since for example it is well known that Mercedes-Benz vehicles belong to high-end manufacturers, which means that the price per car is much higher. Also the number of miles traveled, since generally a vehicle with higher mileage has more wear, which causes a devaluation of the car.

Create a JSON file with the name of the input variables. In order to facilitate the pre-processing of new data. Since categorical variables, such as the streaming model, are in One Hot Encoding format.

Application Creation

Load Libraries

Load JSON File

Creation of the Widgets

This function is the heart of the application, where the user will enter data, in order to generate the prediction of the vehicle of interest.

st.selectbox: Allows the creation of a box, where all possible categories will be found, it is very useful since we make sure that the user enters incorrect data, such as the vehicle manufacturer and model.
st.radio: Display a radio button widget, personally I like to use it when we have very few categories, for example, the type of transmission of the vehicle.
st.slider: Display a slider widget, especially useful to use with data of the integer type as in the case of the vehicle’s manufacturing date, in addition to being very attractive to the eye.
st.number_input: Display a numeric input widget. I like to do it when we handle float type values, that is to say values with decimal, for example in this case the number of miles traveled by the car.

Preprocessing of New Data

Thanks to the JSON format file that you import with the name of the columns, it makes it easy for numpy to preprocess One Hot Encoding.

The where function of the Numpy library returns us an exact index for the variable categories to finally perform the transformation in an array asarray, since it is the type of array with which we train the model and it is also cheaper computationally.

Generate New Predictions

We load the previously trained model in a JSON format since it works faster. Additionally, we assign a new button to generate the estimated price of the car in GBP.

To verify that the application works correctly, you can run it on the local server with the following command:

streamlit run 'app_streamlit.py'

When executing the application, you may have to put the exact path of each file, just remember that the final version of the files must be in the repository, so you will have to eliminate the paths and only put the name of the file.

Requirements

xgboost==1.6.1
scikit-learn==1.0.2
numpy==1.22.4

Finally we create a .txt file, where we assign the versions of the libraries used in the development of the project.

Streamlit Cloud

Step 1: We turn to the Streamlit page, but first we must upload the files in the project to a GitHub repository, to connect Streamlit Cloud with said repository.