Image in, Recipe out

Peiyan Wang
Information Retrieval
6 min readDec 17, 2020

Introduction

When people travel or move to a new country, there may be many vegetables and fruits they have never seen before. To find out how to make a dinner with the unseen vegetables, they may need to take a picture of the ingredients, upload it to google image, get the name of the vegetable from the retrieved results, and finally search for the recipe. To simplify these steps, we build a search engine to help people decide what kind of meal they would like to have based on the ingredients by only taking a picture. To realize this, we build a CNN classifier to recognize the vegetable in the input image and modify the returned label as a search query to generate recipes for users. The ranking method we have approached is BM25 and apply NDCG@10 as the evaluation metric to our search engine.

Data

Data sources

The datasets we used are from two platforms. One is named as ImageNet, which is a huge public image resource dataset platform that provides URLs and thumbnails with hundreds of categories of objects from vegetables to animals to researchers. The other is a dataset published in Github and is initially designed for cuisine classification. For both datasets, we only take part of them to use in our project.

Data obtaining

We first manually copied all contents in recipes text files from Github and then manipulated the text file. From this combined file, we selected overall 21 types of vegetables then made up 23 related phrases and sentences as our queries for the search engine. For image data, we downloaded compressed files for the objects we wanted. However, we noticed that not all vegetables that we manually selected from recipes text are available to download. Therefore, we manually downloaded usable images and saved them in corresponding folders.

Data preprocessing

Since image data all have different sizes, we generated new images with the same size, then store them in an array as features. We then normalized features by dividing 255 to make it easier to train the model with more stable gradients. Since CNN cannot take in categorical variables, we transformed vegetable names, our outcomes, from text to an array by one-hot encoding. Below are part of the codes for reading in images.

def add_label_to_image(vege_type,DIR,IMG_SIZE=64):
for img in tqdm(os.listdir(DIR)):
try:
label = vege_type
path = os.path.join(DIR,img)
img = cv2.imread(path,cv2.IMREAD_COLOR)
# resize the image
img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
X.append(np.array(img))
Z.append(str(label))
except Exception as e:
print(str(e))
# read in the images
X=[] # image
Z=[] # label
for value in list(vege.values()):
print(value)
add_label_to_image(vege_type=value,DIR=value)
X = np.array(X)
# normalized the data to 0-1
X = X.astype('float32')/255
Z = np.array(Z)

Recipe data description

For recipe texts, there are duplicates in recipes therefore we drop these and reset indexes. The final recipes file with CSV format has a shape of 1410 rows and 2 columns.

First 5 rows of recipe dataset after modification

The numbers of images in each vegetable folder are listed below. We noticed that, on average, the number of images for each category is around 1000. While there are some outliers — ginger, garlic, eggplant, oat, and parsley. These vegetables are having less than 100 images on average, which might result in low accuracy in CNN classifiers.

Method

Image data augmentation

In order to optimize and evaluate our CNN model, we split the whole dataset into the train(72%), validation(18%), and test(10%) datasets. Accuracy is used as the evaluation matrix to evaluate the performance of the classification model. We applied image data augmentation using ImageDataGenerator to generate transformed images, such as rotation and horizontal flipping, to prevent overfitting.

# data augmentation to prevent overfitting
datagen = ImageDataGenerator(
rotation_range=40,
shear_range=0.2,
zoom_range = 0.2,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=False)
datagen.fit(X_train)

Image classification

We tried one dummy classifier predicting the most frequent class of vegetables and four CNN classifiers to compare their performance. The first CNN model is built from scratch. It contains 4 convolutional layers followed by max-pooling layers and two fully connected layers. The other three CNN models are transferred from the VGG16 model which is a convolutional neural network model that achieves top5 test accuracy in ImageNet. It has 5 convolutional blocks and 3 fully connected layers. In this project, the VGG16 classifier tuning the last two conv blocks has the best performance. We include the code as below.

try:
model = load_model('vgg16_conv_tuning_last2conv_modelsave.h5')
except:
model=Sequential()
model.add(vgg16_model)
model.add(Dense(256,activation='relu'))
model.add(Dense(21,activation='softmax'))
# unfreeze the last two conv blocks
vgg16_model.trainable = True
set_trainable = False
for layer in vgg16_model.layers:
if layer.name == 'block4_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=1e-4),
metrics=['accuracy'])
print('compiled!')
print(model.summary())
# train the model
# save the best weight with least validation loss
checkpointer = ModelCheckpoint(filepath = 'vgg16_vege_conv_tuning_2.hdf5', verbose = 1, save_best_only = True)
history_4 = model.fit(datagen.flow(X_train,y_train,batch_size = 32),
steps_per_epoch = X_train.shape[0]//32,
epochs=25,
validation_data=(X_val, y_val),
callbacks = [checkpointer],
verbose=2, shuffle=True)

Information retrieval system via the BM25 ranking model

We choose BM25 computed by the metapy package as our ranking model. The recipe texts are used to build inverted indexes to support users’ queries. Part of the code is shown below.

# build the inverted index with metapy
inv_idx = metapy.index.make_inverted_index('recipe_ir-config.toml')
ranker = metapy.index.OkapiBM25(k1 = 1.2, b = 0.5, k3 = 500)
# retrieve top 10 results
num_results = 10
retrieval_results = []
with open('recipe_ir/recipe_ir-queries.txt') as query_file:
for query_num, line in enumerate(query_file):
print(query_num,line)
query = metapy.index.Document()
query.content(line.strip())
results = ranker.score(inv_idx, query, num_results)
res_list = [(query_num + 1, x[0]+1) for x in results]
retrieval_results += res_list

Results and Discussions

VGG16 model tuning last two conv blocks has the best performance

We tried one dummy classifier and 4 CNN classifiers to recognize the images. We found that the VGG16 model tuning the last two conv blocks has the best performance with accuracy as high as 0.56, which increased by 86% from the performance of the dummy classifier. The performance of the 5 models is shown below.

The performance of the 5 image classification models

The basic BM25 model has a good performance on retrieving recipes

After the users get the classification of the vegetable image, they can type in the relevant query. BM25 ranking model will return the top 10 results. The NDCG@10 scores of the 23 queries are shown below. The average accuracy score is 0.94, which looks good enough on these queries.

The NDCG@10 score of the annotated queries

What’s Next

If we have more time, we would devote more effort to the following aspects.

  1. More images and balanced dataset

Since the classifiers have an overfitting problem, we would add more images, especially for those vegetables with less than 100 images.

2. More complex queries

We would try more complex queries to test our information retrieval system, such as adding the country or cooking methods to search for the recipes.

3. More recipes

We chose the recipe text from a Github repo. However, the recipes are sometimes hard to understand because they do not have titles and need more sentence breaks. We should use other sources for our recipes, like allrecipes.com. Because it has more structured texts and easier to read. A web crawler would be used to extract the texts.

4. Try other ranking models

When we try the queries with no common words in the recipe, the BM25 ranking model may have a bad performance. We will need to try other ranking models such as a language model or PLSI.

5. Build a more user-friendly interface

We first tried a simple user interaction interface through the command line. We want to fill the model and methods in the flask framework added with some modern CSS styles and designs so that users are able to interact with the retrieval engine like using a real web application.

--

--