Hey IKEA!! What is this called again?

Souvik Ghosh

Published in

strai

6 min readJan 27, 2022

I want to remember what this is called!!

Yup that’s me in 2018 (the good times!!)

Exhausted but quite satisfied adulting my way into a new apartment and assembling IKEA furniture for the first time.

CUT TO 2022!!!

We are planning to move and probably a very sane thing to do is likely to replace the covers of my couch for a nicer one instead of you know throwing away a perfect working couch for a new one.

But as quite in my own way, i decided to throw away all the assembly papers and now we are in a situation when i don’t remember what is the name of this thing. After a long long search in the ikea website, i finally managed to relate this picture of my ikea sofa to the ones on the website and voila, i found what i was looking for but it was a quite a ride and i cannot be the only one who tries this hard to find out what did they buy from IKEA some YEARS back and can’t remember what it is.

If you have been following my articles few years ago, i have dedicated a lot of my time into Natural Language Processing but there is another domain i always wanted to learn more was Image Recognition and what better way than to build an awesome tool to learn about Image Recognition other than building a reverse image search for myself. 😉

AN IKEA REVERSE IMAGE SEARCH

This ain’t commercial and i have built this app to learn more about multi modal search engines and how conversational applications in the future is beyond just text. We need to start thinking of visual and audio alongside text processing when it comes to building chatbots.

Let’s start simple and build a search engine first!!

A Simple Neural Search Engine

In comes my new obsession —Jina— a very pythonic simple platform that lets you build a search engine on multi modal data(text, audio, images and video… and if i am right, also 3D mesh). That is quite amazing.

So how do we start — well you can always find all of the code on Github. Unfortunately to stay true to IKEA’s condition, I won’t be publishing any dataset or the index because it is strictly for personal use. That being said, i will share the code to get the data yourself and use it at your own risk.

Data!!

Well not much of a choice for me here and after carefully reading through IKEA’s terms and conditions, it is permitted to download product images for private use. Well we can definitely categorize this for non-commercial purposes.

Let’s get scraping then!

I used a couple of links, first an HTML one to pull off all the product categories (link is in the code) and second an API call to download all product list data and then using the image URL, download image per product.

I wanted to prepare a proper dataset and use an ORM to do so. Luckily i found https://sqlmodel.tiangolo.com/ from the creator of the wonderful framework FAST API. Thus i got scraping and created a nice looking product catalog and dumped it in a SQL table. Why, I don’t know, maybe to use the data and description to improve my search in the future towards a more cross-modal one. Given that we are moving houses we will be buying quite a few items and i feel this might just come in handy for me! 😍

Well, we have a proper dataset which contains all relevant product information and all images neatly downloaded in a folder.

Build an Index

Well we are building a search, thus we need to build an index. Luckily the great folks at JINA took care of that and god loves them as they created this awesome package called https://docarray.jina.ai/ — a very nice way to manage data and the embeddings in a pythonic way

Since we are dealing with images, First up load up the images, i loaded a sub-section of IKEA products because i was mostly concerned with searching sofas and chairs

Loading the images is quite simple with jina

from jina import Document, DocumentArray
list_of_files = []
for root, dirs, files in os.walk("images/"):
  for file in files:
    list_of_files.append(os.path.join(root, file))
list_of_files = sorted(list_of_files)def createDocumentArray(images):
  da = DocumentArray((Document(
                      uri=name, 
                      tags={"file": {"file_name":        
                      name}}) for name in images))
  return da
docs = createDocumentArray(list_of_files)

Here I pass a list of image paths to the function create Document Array and using an iterators create a Document Object for every image. Also added the file_name as tags to identify some metadata about the product.

We would also need to preprocess these images into arrays and represent them as tensors before we feed them to a large pretrained model which then outputs an embedding(a large high dimensional vector array) to represent the image. So in code and once again applying the magic of docarray, we perform the following operation to every document

d.load_uri_to_image_blob(width=224, height=224).set_image_blob_normalization().set_image_blob_channel_axis(-1, 0)

We load the image into a blob(which in the latest version has been renamed to a tensor) of height and width of 224x224, since i have a decent sized yet surprisingly small GPU in terms of power, I wanted to ensure i can do this little demo without spending eternity, i reformat the images into 224X224 ratio.

The other two method would normalize the color of the image and swith axis of the RGB channel and the image would up looking like this.

Now we processed the image which we can now feed to a large pretrained model for simplicity sake, as i will discuss later, this model gave very average results but let’s me test the pipeline quite simply. I believe the images still needs more preprocessing and perhaps a better finetuned model(more on that later) would definitely make it more interesting and possibly make the search be more accurate.

Okay, let’s fast forward a bit, i used RESNET50 but I am sure there are million other image2vec models out there which could do a better job. but i stick to the tutorials myself since newbie!!

once again, DocumentArray to the rescue, a simple embed method of DocumentArray was enough for me to pass the Resnet50 model to generate embeddings for every Document! One single line!!

docs = embed(docs, model)

amazing. Now all we got to do is save this index as a binary so we can load it up when we are running the flow.

Let’s get the FLOW started

From DocumentArray we move to the actual Jina package that very seamlessly runs a Jina server where every input upon calling search goes through the same steps as above but in a scalable distributed way. So how does the flow looks for me

NEAT!! Parallel embedding function for every input image i provide. Quite scalable. You will find rest of the code here on how to get this started.

The good folks at Jina also provided a neat front end built on Streamlit to test it. I modified it a little bit for my taste but voila!! we have finally built something useful, well at least for me. I am sorry IKEA but i cannot remember those swedish names for your product and I just need to look em up!!

Is this perfect!! Absolutely NOT!! does this generalize well. IT DOES NOT..

In the next article , if i am not lazy enough to write, i will try and make an even better app, fetch recommendation but until then, if you have reached until here. Feel free to connect with me on

LinkedIn — https://www.linkedin.com/in/souvik-ghosh-aaa30470/

Github — https://github.com/souvikg10