How to Create a Simple Machine Learning Demo Using Streamlit Framework
Nowadays it is not enough to develop a good product. It is crucial how you present it. There are a lot of cases when breakthrough ideas got no sponsorship without a proper demonstration of their capabilities. It is a proven psychological fact that beautiful design plays an important role in the success of any product. A beautiful presentation can make a much greater contribution to your ultimate success than you might expect.
Today the key concept is simplification, and it works for design as well. Extra buttons are replaced by intuitive and user-friendly interface. So application development becomes not just a time consuming process, but pretty expensive and demanding a huge team of developers. However, adding some features or solving the issues can take some time as well. In some cases, when the deadline is near it is inappropriate. The technologies evolve and grow, and the development speed is growing constantly too. Now one developer simply can’t develop intelligent modules and a website to show off, because the technologies differ too much. Fortunately, a lot of companies face such an issue, so it has been developed a new framework to solve it.
The basic approaches and principles underlying the recommendation systems were examined in the previous article . We have developed a simple and working algorithm to implement our mathematical model , that can recommend for a set number of people.
Now we will have a detailed look at the development process of an application for getting music by name. To do it we will use the Streamlit framework. This technology allows you to create beautiful sites using only Python. It is cool, as far you don’t need to be familiar with any other technology or program.
Streamlit was designed by enterprise employees, specially for intelligent systems developers (meanwhile, the usage of the framework is not limited by it). Streamlit is intended to reduce the development time by excluding the next steps: application deployment, backend & frontend development. You simply make your application code in Python and see the result in the browser immediately. This development approach allows you to fix most errors in place. By the way, the development process is look like working with Jupyter Notebook, that so many may like. The final application code obtained during the development process is readable and clean, which allows other developers to quickly understand the principle of the algorithm.
The toolkit works on the principle of a regular Python script. This means that any change in the interface must be completely changed. There are necessary utilities for working with complex calculations through caching of results.
In addition to the pros, of course, there are minor disadvantages. Only two views are available to represent the interface (menu and main view). This may push some developers away.
Streamlit framework was developed by specialists from large companies specifically for developers of intelligent systems (but the technology is not limited only to this area) and is intended to reduce the time for application development by excluding such steps as application deployment, backend and frontend development. You simply implement your application code in Python and immediately see the result in a browser. This development approach allows you to fix most errors in place. In addition, the development is very similar to working in a Jupyter Notebook, which a lot of developers may like. The final application code obtained during development with this tool is readable and clean. This, in turn, allows other developers to understand the principle of the algorithm quickly.
The toolkit works on the principle of a regular Python script. This means that any change in the widgets of the position interface restarts its work, and the well-thought-out architecture of the framework recounts only those places that are necessary for changing the interface, so the framework is very fast in comparison to the similar ones. Also there are necessary utilities for working with complex calculations by caching results.
In addition to the pros there are minor disadvantages as well. It is worth considering the fact that there are only two fields for displaying the interface at your disposal (menu and main area accordingly). Besides elements can only be displayed sequentially one after another, which may seem strange for some developers.
Let’s get back to application development. First you need to collect a large database of users and their music listening history. It is necessary to create connection between groups. This step will allow us to look for similar artists in the future. As a basis, you can use the LastFM Dataset database for several hundred thousand records. This is more than enough to create a solid system.
Also it is possible to collect information about artists on Wikipedia and, after grouping it, save it in a relational database. We will use SQLite as the main database, as far it is one of the fastest databases and Python also has built-in modules for working with it.
After the successful completion of all activities with data collection, using the matrix factorization method discussed , we calculate the matrix for recommendations (it is worth considering that the algorithm from the previous article has a very large asymptotic complexity and it is not recommended to use it for big data. For these purposes, use it’s modifications: , SVD ++ and others). As soon as all the necessary preparations for the implementation of the main app are completed you can proceed to work with the site.
First you need to implement the functions of finding an artist in the database by name and index. To do this, we implement the get_info function, which will perform a simple query to our database. And in order not to write several functions we use the Python event dispatch mechanism, working with the standard module . It should be noted that you should normalize the string to search by the artist name. So you can search for strings with non-ASCII characters in the name. The search function code is shown below:
# Create a cursor to the main music database. CURSOR = sqlite3.connect('soundera.sqlite3').cursor() @st.cache @functools.singledispatch def get_info(name: str) -> Tuple[str, str, int]: """Find a artist information by provided name. This method is necessary to find a artist information by provided name in the main music database. Everyone results of this method are streamlit caching to achieve higher performance. """ name = unicodedata.normalize('NFKD', name.lower().strip()) # Executing artist search by name among available in the database. CURSOR.execute("SELECT * FROM artists WHERE name = ? LIMIT 1", [name]) return CURSOR.fetchone() @st.cache @get_info.register(int) def _(name_id: int) -> Tuple[str, str, int]: """Find a artist information by provided name_id. This method is necessary to find a artist information by provided index in the main music database. Everyone results of this method are streamlit caching to achieve higher performance. This function is extension of the method get_info. """ CURSOR.execute("SELECT * FROM artists WHERE name_id = ? LIMIT 1", [name_id]) return CURSOR.fetchone()
After the function to search for the performer is implemented successfully, we should load the previously calculated parameters for the mathematical model. To save them and for subsequent quick work it is recommended to use the h5 data storage format. It is optimized for working with numbers. To do this, we write a small function using the h5py library and the result will be cached. Please see below the implementation of this function:
@st.cache def load_params(filename: str) -> Tuple[np.ndarray, np.ndarray]: """Restore system params state from the user provided file. This method is necessary to restore system params state from the user provided path to file. The system parameters of the model are stored in a special format for storing number values. """ with h5py.File(pathlib.Path(filename).resolve(), 'r') as source_f: x = source_f['x'][:] y = source_f['y'][:] return x, y
Now we need to implement a function for computing similar groups. To do that we need to multiply the matrix of hidden attributes U by the vector of signs of the performer we are focused on UTi and divide everything by the norm of the vector. We use the following formula:
This formula allows you to calculate the similarity of objects among themselves, which is required in the task. The code for this function is below:
@st.cache def recommend(item_id: int) -> List[Tuple[int, float]]: """Search for the most correlated items to the item of user interest. This method is necessary to search for the most correlated items to the item of user interest. The conclusion results sort by descending order in the vector of the correlations of items. """ params = load_params('soundera.h5') # Computing the norm of item factors vector. norms = np.sqrt((params[1] * params[1]).sum(axis=1)) # Computing the correlation between items in the dataset. scores = params[1].dot(params[1][item_id]).flatten() / norms # Computing indexes of items in the primary dataset. indexes = np.argpartition(scores, -26)[-26:] return sorted(zip(indexes, scores[indexes]), key=lambda x: -x[1])
Having all the necessary functions, you can develop the user interface using Streamlit. On the left menu we need to make an input field for entering the name of the group. It is quite simple to do: you need to create the text_input element, available in the Streamlit library, and assign the result to a variable. Please see below the code:
# Create a widget for entering the name of a song or artist. music_title = st.sidebar.text_input('Enter a music title or artist name:', 'Placebo')
Now you have to to request information from the database and pass it to a function to calculate recommendations. The results can be displayed in the interface window with information about the group collected earlier. The implementation code is provided below:
try: name, info, index = get_info(music_title) # Show name of the artist. st.title(name.title()) # Computing the most correlated items for provided music name. items = (get_info(int(item[0])) for item in recommend(index)) # Show a table with results. st.table(pd.DataFrame(items)) if not info: st.markdown(unsafe_allow_html=True, body=""" We could not find description for this artist or group name for you requested :( Perhaps the description of this artist or group name is not in the database. """) else: # Showing information about the artist if possible. st.markdown(body=f'{info}', unsafe_allow_html=True) except TypeError: st.markdown(unsafe_allow_html=True, body=""" We could not find a suitable artist or group name for the name you requested :( Perhaps this artist or group name is not in the database or the name is entered incorrectly. Checkup the name of the artist or group and try again. """)
Done! Now you can run the created application using the Streamlit run main command and get recommendations for your favorite groups. Our example:
After successful implementation you have to deploy your application in the cloud and share with friends or use it to get recommendations. To do this, you should first create a virtual Python environment and install all the necessary project dependencies. You can run the following command:
$ sudo apt-get -y install graphviz python3-venv tmux
$ python3 -m venv venv
$ ./venv/bin/pip install graphviz==0.13.0 h5py==2.10.0 numpy==1.17.3 streamlit==0.50.2
Then move all the project files in cloud. You can use the ssh utility for example. To do this, run the following commands:
$ scp -r ./music_algo root@xxxx.xxxx.xxxx.xxxx:~/
Next, you need to configure Streamlit to work correctly on the network. Specify the following settings in the ~ / .streamlit / config.toml file:
After making the changes save the file and run the script using the tmux utility, that was installed earlier. To do this run the following commands:
$ tmux new-session -d -s “music” ~/.venv/venv/streamlit run main.py
This command will start the daemon with the app process. To control the application one should run the command: tmux attach-session -t music and do the necessary actions. This approach ensures error-free workflow of the application after your ssh session will be disconnected.
Originally published at https://celadonsoft.com.