Building a Semantic Book Search with OpenAI’s CLIP Model

Eva Revear
4 min readJan 1, 2024

--

Photo by Ed Robertson on Unsplash

When I’m looking for a new book to read I’ll often leverage Reddit to find very specific suggestions. I’ll search something along the lines of ‘a cozy mystery with a woman amateur sleuth’ and, if I’m lucky, someone on r/SuggestMeABook will have made a post looking for similar recommendations.

When I read about the OpenAI CLIP model, it got me thinking about whether it could be used to support that type of semantic search for books.

I won’t dive too deep into what CLIP is, there are a ton of folks more knowledgeable than I am who have written about that. But, at a high level OpenAI’s CLIP model is a neural network trained on millions of image + text pairs to generate meaningful embeddings for both. Through such extensive training CLIP can map text and images into a shared embedding space that represents the semantic relationships between the two. This enables it to associate descriptions with corresponding visual content.

Image of model architecture from OpenAI

So why CLIP for this? Book cover design is all about letting readers know what type of experience they’ll find inside. Authors and artists want to catch the eye of a reader that will be interested in the theme, genre, and tone of the content. That being the case, book covers hold a ton of semantic information, and I wanted to know if pulling that out would allow me to create the type of book search I’m looking for.

Note: for those wondering, can’t you just use ChatGPT (or plain old Google) to find that type of book recommendation? Yes, but where’s the fun in that?

Step one was of course gathering data. The OpenLibrary makes information about millions of books available in monthly data dumps. This usually includes the covers which are available as jpegs.

I ended up using pandas to batch through the book dump file as it was pretty large. The full code is on Github. Note that this is a POC that was built mostly in Colab, using Google Drive as storage, so it’s a little rough and ready.

I filtered for English fiction books and removed juvenile books, just to get the dataset down to an easily workable size. I also ended up focusing on books published within the past 3 years, which left me about 4000 books with cover images.

This is the set I fed to the CLIP image embedding model. I sent the images in batches of five as that sped things up a bit.

Finally, I simply stored the results in a single dataframe, as my “vector database.” The app itself takes in a search string, embeds it via the model, and uses cosine_similarity to find the top book covers that match the search.

The results in a lot of cases actually ended up being fairly reasonable. The query ‘a cozy mystery with a woman amateur sleuth’ returns the books below, which in my opinion, fit exactly to the prompt.

More complicated, or less obvious ones like the example below return things that to me feel a little tangential, or completely off, but I understand how it got there.

It certainly isn’t a replacement for a lengthy Reddit thread of excellent recommendations from excited readers.

But, that being said, the app is available to play around with Streamlit. If you give it a try let me know your prompts and what the search comes up with!

Next Steps

This POC is built with only a few thousand book cover images because working with the data given limited storage became a bit prohibitive, but I’d love to get it up and running with the millions available in the data set. So tune in for part 2 where I try to move the pipeline to the cloud!

--

--

Eva Revear

Data Engineer 🔧👷🏽‍♀️ I write about creating with data and project based learning https://www.linkedin.com/in/eva-revear/