Machine Learning mini project: Build a book recommender system web app using Python and Streamlit.

Saurabh singh
4 min readMar 3, 2024

--

Install Streamlit:

Open you command line and enter:

pip install streamlit

Download dataset:

Python

Create a new python file named ‘book_recommender.py’ in the same folder where you downloaded the dataset.

Add the following code to your python file:

# Import necessary libraries
import streamlit as st
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# Load the dataset
books = pd.read_csv("goodreads_data.csv")

# Preprocess the data (remove duplicates, handle missing values, etc.)

# Fill NaN values in the 'Description' column with an empty string
books['Description'] = books['Description'].fillna('')

# Create a TF-IDF Vectorizer object
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(books['Description'])

# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

The TfidfVectorizer in Python converts text data into numerical vectors using the Term Frequency-Inverse Document Frequency (TF-IDF) technique. Here's a simple explanation:

  • Bag-of-Words (BoW) Model: Initially, text data is converted into a matrix where each row represents a document, and each column represents a unique word in the entire corpus. The values in this matrix are typically the raw count of words in each document.
  • TF-IDF Transformation:
  • Term Frequency (TF): It calculates the frequency of a word in a document. For example, if a word appears twice in a document with 10 words, its TF value would be 2/10.
  • Inverse Document Frequency (IDF): It measures the importance of a word in the entire corpus. Common words like ‘the’ have low IDF values, while rare words have high IDF values.
  • TF-IDF: This is the product of TF and IDF. It gives more weight to words that are frequent in a document but rare across documents.
  • How TF-IDF Improves Over BoW:
  • BoW treats all words equally based on frequency, while TF-IDF emphasizes words that are important to a specific document but not common across all documents.
  • TF-IDF helps in capturing the uniqueness of words in documents and is more effective at representing text data for machine learning models.

In summary, TF-IDF assigns weights to words based on their importance in individual documents and across the entire corpus, providing a more nuanced representation of text data compared to simple word counts.

# Function to get book recommendations based on book title
def get_recommendations(book_title, cosine_sim=cosine_sim, data=books):
# Check if the book title exists in the dataset
if book_title not in data['Book'].values:
return "Book title not found in the dataset"

# Get the index of the book that matches the title
idx = data[data['Book'] == book_title].index
if len(idx) == 0:
return "Book title not found in the dataset"

idx = idx[0] # Get the first index if multiple matches

# Get the pairwise similarity scores with that book
sim_scores = list(enumerate(cosine_sim[idx]))

# Sort the books based on the similarity scores
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

# Get the top 10 most similar books
sim_scores = sim_scores[1:11]

# Get the book indices
book_indices = [i[0] for i in sim_scores]

# Return the top 10 recommended books
return data['Book'].iloc[book_indices]

the above code snippet takes a book title as input, finds the most similar books based on cosine similarity scores, and returns a list of top 10 recommended books from the dataset. This process forms the core functionality of a book recommender system that suggests similar books based on a user’s selected book.

# Streamlit App to host the Book Recommender System
def main():
st.title("Book Recommender System")

# Sidebar to input book title
book_title = st.sidebar.text_input("Enter a Book Title")

if st.sidebar.button("Recommend"):
if book_title:
recommended_books = get_recommendations(book_title)
st.subheader("Recommended Books:")
for i, book in enumerate(recommended_books):
st.write(f"{i+1}. {book}")
else:
st.write("Please enter a valid book title.")

if __name__ == '__main__':
main()

Run the web app:

Open you command line again and go to the folder where you have placed your python file and enter the following command:

streamlit run book_recommender.py

you should see the following output:

Voila, you just created you own book recommender system web app with python and Streamlit.

Github: https://github.com/SaurabhSinghTheAnalyst/Book-Recommender-System

--

--

Saurabh singh

I write to help you get healthier, happier & to get ahead in your career