https://pixabay.com/photos/workspace-coffee-laptop-macbook-1280538/

Deploying Basic Regression Model on Streamlit

Daniel Schlant

--

Quick tutorial on how I added a Linear Regression Model to Streamlit

For a recent project, I decided to deploy a fairly basic regression model onto Streamlit In case it helps any data science students with the process in the future, I will lay out the steps of the process as I addressed them. This app is fairly basic, and was intended to be straightforward and simple to interact with.

Building the Model

The model intends to provide a prediction of the length of a federal prison sentence using data from years 2018–2021. The first step I took was to set up the regression model that will be used to make the predictions in VS Studio Code, the text editor I will be using throughout. The necessary libraries and dataset are read in, the Lasso regression model built via pipeline, trained and scored.

Writing the Pickle File

Now that the model we will deploy onto the app has been created, we can save it down as a pickle file. If you are unfamiliar with the pickle process, here is the official Python documentation for the pickle module (which was imported along with the other modules at the top of the editor). A pickle file is opened in which the pickle representation of the model object will be written. ‘wb’ denotes the file is open for writing and open in binary mode.

with open("file_path/file_name.pkl", "wb") as file:   pickle.dump(model_object, file)

Designing the Application Layout

Now that the model has been written into a pickle file, I create a new file that will contain the code for the app that will ultimately be deployed to Streamlit. Sreamlit will be imported as ‘st’ for this workbook, along with pickle, which will be imported again so that we can access the pickle file that we have previously saved.

The below code is the beginning of the formatting for the application, giving the app a wide layout, as well as a title and header.

It is now important to note the format of the application that will be built. The application will be a series of widgets (link to Streamlit’s widget documentation here) inserted into the application which will allow a user to specify and adjust characteristics of an individual and the alleged offense in order to observe the impact of the changes on the model’s predicted sentence length. The next step I take will be to configure the widgets on the application.

As stated, for my own application, each of the elements (‘col1’ or ‘col2’) are widgets that supply the user with an additional feature to adjust. The below code creates the format, which will be further built out later in the workbook. The st.write(‘’) code serves to create space between the lines of columns in the application.

As can be seen in the Streamlit documentation here, any element can be added in this format: headers, images, etc. The ‘st.columns(2)’ code dictates to Streamlit the number of columns that will appear on that given line of the app. Each column on the line will have equal width, proportional to the number of columns that Streamlit has been instructed to insert. The number of elements provided (‘col1’ & ‘col2’) should equal the integer of elements specified in the columns parameter (2).

Adding Input Widgets

Now that I structured the way the widgets would be formatted on the application, I inserted the widgets themselves. Into col1 and col2 (which themselves make up one line of widgets in the app) were inserted two selectboxes (documentation here), which create a dropdown menu of items that can be selected. The first parameter in the selectbox is the label — here specified, simply enough, as ‘Year Sentenced’ and ‘State’ . Next, the options: with lists provided that contains all of the years/states that we want to be made available to the user to select.

All of the input widgets used for this application were selectboxes, with the exception of the age feature, for which I used the number_input widget. This widget allows for users to enter an integer between 16 and 93 (the minimum and maximum age value for our dataset), either via the keyboard or using a +/- control within the app interface. Below is the code for the next line of columns, which includes the number_input widget.

Following the code that will add the desired widgets to the application, I created a numpy array collecting these inputs, via the variable names that captured them: ‘age’, ‘race’, ‘citizen’, etc.

Prediction Function

Next was building the function that will generate our predictions, using the prediction model that was dumped into the pickle file and the user inputs, collected above as a numpy array. An empty dictionary is set up at the top of this function in order to collect the values that these inputs represent, as many will have to be converted to be processed by the model:

def prediction_generator(prediction_model,model_inputs):   user_input_code = {}

Within the function, the user inputs will be matched up with corresponding values within the dataset. The dataset is a collection of numeric (age, number of convictions) and categorical features that are binary (i.e. Male(0)/Female(1)), strings (i.e. state name), and numeric(an example of categorical data dictionary below).

Below is an example of some of the code utilized to fill the empty dictionary. The key names entered will match the feature names that the model were trained upon: ‘year_sentenced’, ‘sentence_type’ and ‘imprisoned’, for example. This is not the only way to structure this step, but it was the most intuitive, in my opinion.

The ‘year_sentenced’ key is paired with the integer version of year sentenced that was entered via the widget’s selectbox. It should be noted that ‘sentence_type’ and ‘imprisoned’ are both set equal to 1. This is because, while the model will utilize these features, I did not want to make them available to a user to modify, so that the app would be less cluttered. Any feature included in the model, for which I did not want to make a widget available to a user, was explicitly assigned a value within the function.

The ‘dependents’ code showcases the type of if-statement that was used for many of the features. If the selectbox was set to a certain value (‘Dependents’ = ‘No’) , the model would be fed the appropriate corresponding value (user_input_code[‘dependents’] = 0), using the ‘dependent’ input variable included in the array that was provided as an argument for the function. In this way, all of the independent features will be assigned values within the function’s dictionary.

After the dictionary has been filled with its final key-value pair, the final line of the function’s code will return the model’s prediction. The code below turns the dictionary of model inputs into a pandas dataframe. The model provided as the first argument of the function will be used to predict the length of prison sentence based upon the single observation dataframe that it is fed. My own model rounded the answer for clarity on the app interface, but below is the essential code.

return pred_model.predict(pd.DataFrame(user_input_code,index=[0]))

Accessing Pickle File

To access the pickle file previously saved, the below code will open the file in an open for reading/open in binary capacity. This file contains the Lasso regression pipeline cited previously in this post.

with open("file_path/file_name.pkl", "rb") as file:   lasso_pipe = pickle.load(file)

Adding Prediction to the App

Below I have included the code that adds the model’s prediction to the Streamlit app. The spinner creates a little bit of a flourish, adding a spinning circle while the model ‘calculates’ after the user has modified an input and a new sentence length prediction is prepared. The time.sleep line will instruct the app to leave this “Predicting…” spinner up on screen for 1/2 second, before stating the prediction at the bottom of the application. The st.title code dictates the size of font (fairly large ‘title’ font). The app uses the function built, with the model and user inputs as arguments, to generate the prediction.

with st.spinner("Predicting..."):   time.sleep(0.5)   prediction = prediction_generator(lasso_pipe, user_input)   st.title('The predicted sentence length is '+str(prediction)+'    years.')

Finished Product

Below is a screenshot of the finished product that was created. It is basic, but it is intuitive, easy to use, and quick.

Thank you for reading, and I hope this tutorial has helped.

Best

Daniel Schlant

--

--