Detecting what’s on the plate with Keras

Paarth Bir
Analytics Vidhya
Published in
4 min readJan 14, 2020

I am a foodie and I am a computer vision enthusiast. It only makes sense that I would eventually wind up doing something like this. After sifting through Kaggle for interesting datasets to implement what I learnt from deeplearning.ai’s Tensorflow: Data and Deployment on Coursera, I stumbled upon the Food-101 dataset and voila!

The Food-101 dataset as the name suggests a dataset containing 1000 images each of a whopping 101 dishes from apple pie to waffles (and I’m slightly ashamed to say I hadn’t heard quite a few of them despite my affinity for food). Considering that I’m highly unlikely to run across a foie gras or a huevos ranchos, the dataset was cut from a 101 classes to 20 for the sake of simplicity.

Huevos Ranchos and Foie Gras.
chicken curry	    fried rice	     samosa
chicken wings garlic bread sandwich
chocolate cake hamburger soup
cup cake hot-dog spring rolls
donuts ice-cream sushi
dumplings omelette waffles
french fries pizza

With a 1000 images each of the categories above, it was too less data to train an accurate model from scratch. In comes Transfer Learning, using layers from the MobileNetV1 pretrained on the ImageNet dataset.

conv_base = applications.MobileNet(weights = "imagenet", include_top=False, input_shape = (256, 256, 3))

Adding a GlobalMaxPooling2D layer and Dense layers, the model architecture looked like this:

Model: "sequential_8" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= mobilenet_1.00_224 (Model)   (None, 8, 8, 1024)        3228864    _________________________________________________________________ global_average_pooling2d_6 ( (None, 1024)              0          _________________________________________________________________ dense_13 (Dense)             (None, 1024)              1049600    _________________________________________________________________ dropout_25 (Dropout)         (None, 1024)              0          _________________________________________________________________ dense_14 (Dense)             (None, 20)                20500      ================================================================= Total params: 4,298,964 Trainable params: 4,277,076 Non-trainable params: 21,888

To avoid overfitting, Data Augmentation was applied with augmentation techniques including image rotation, linear shifts, zoom, illumination variation and flips.

from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale = 1./255, rotation_range=360,width_shift_range=0.2,height_shift_range=0.2, shear_range = 0.2,zoom_range = [0.5, 1.0],brightness_range = [0.2, 1.0],horizontal_flip = True,vertical_flip =False,zca_whitening=True, zca_epsilon=1e-06)training_set=train_datagen.flow_from_directory(path_training,target_size =(256,256),batch_size=64,class_mode='categorical',shuffle=True)

For initial epochs the transferred layers were frozen.

for layer in conv_base.layers:
layer.trainable=False

Initial training was done with Adam Optimizer with a decaying learning rate, followed by switching to SGD in the later epochs. After a plateau in validation loss was encountered, the transferred layers were unfrozen and training was resumed to further fine tune the model. A training accuracy of 91.34% and a validation accuracy of 85.51% was obtained.

Testing

imn = '/content/test/testimage.jpg' #test image path
img = load_img(imn, target_size=(256, 256))
img = np.asarray(img)
img = img.astype('float32')
img = img/255
img = np.expand_dims(img, axis=0)
img = img.reshape(1,256,256,3)
res = model2.predict(img)
ord = np.argsort(res)
ind = np.argmax(res)
li = ['chicken curry', 'chicken wings', 'ch cake', 'cup cake', 'donuts', 'dumplings', 'fries','fried rice', 'garlic bread', 'hamburger', 'hot-dog', 'ice-cream', 'omelette', 'pizza', 'samosa', 'sandwich','soup','spring rolls', 'sushi', 'waffle']lis = []for i in range(0, 5):
lis.append(li[(ord[0][19 - i])])
print(lis) # Top-5 predictions
[‘pizza’, ‘sushi’, ‘garlic bread’,‘sandwich’, ‘omelette’]
[‘hamburger’, ‘sandwich’, ‘hot-dog’, ‘fried rice’, ‘omelette’]

A keras .h5 model can now be saved. The model occupies around 32.9 MB of space.

Conversion to TensorFlow lite

image_shape = (256, 256, 3)def representative_dataset_gen():
num_calibration_images = 10
for i in range(num_calibration_images):
image = tf.random.normal([1] + list(image_shape))
yield [image]
converter=lite.TFLiteConverter.from_keras_model_file('model.h5')converter.default_ranges_stats=[0,255]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops =[tf.lite.OpsSet.TFLITE_BUILTINS_INT8]converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
model = converter.convert()
file = open( 'model.tflite' , 'wb' )
file.write(model)

The resultant tflite model is of 16.29 MB, almost half the size of the h5 model.

Testing with tflite

imn = '/content/test/testimage.jpg'
img = load_img(imn, target_size=(256, 256))
img = np.asarray(img)
img = img.astype('float32')
img = img/255
img = np.expand_dims(img, axis=0)
img = img.reshape(1,256,256,3)
interpreter = lite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], img)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
ord = np.argsort(output_data)
ind = np.argmax(output_data)
lis = []
for i in range(0, 5):
lis.append(li[(ord[0][19 - i])])
print(lis)

And we are done with the TFlite!

Android

Clone the github repository from here.

To the assets directory of your application add the tflite model and a label.txt file with the labels. Here is how the txt file should look:

label 1
label 2
label 3
..
..
..

In the Kotlin file MainAct.kt edit the variabes mInputSize, mModelPath and mLabelPath according to your needs. We are considering a threshold score of 0.40 specified in the Classifier.kt, play around for the best results!

Here is a screenshot of the application:

Bon (App)étit !

--

--

Paarth Bir
Analytics Vidhya

A Machine learning enthusiast with a penchant for Computer Vision.