Detecting what’s on the plate with Keras
I am a foodie and I am a computer vision enthusiast. It only makes sense that I would eventually wind up doing something like this. After sifting through Kaggle for interesting datasets to implement what I learnt from deeplearning.ai’s Tensorflow: Data and Deployment on Coursera, I stumbled upon the Food-101 dataset and voila!
The Food-101 dataset as the name suggests a dataset containing 1000 images each of a whopping 101 dishes from apple pie to waffles (and I’m slightly ashamed to say I hadn’t heard quite a few of them despite my affinity for food). Considering that I’m highly unlikely to run across a foie gras or a huevos ranchos, the dataset was cut from a 101 classes to 20 for the sake of simplicity.
chicken curry fried rice samosa
chicken wings garlic bread sandwich
chocolate cake hamburger soup
cup cake hot-dog spring rolls
donuts ice-cream sushi
dumplings omelette waffles
french fries pizza
With a 1000 images each of the categories above, it was too less data to train an accurate model from scratch. In comes Transfer Learning, using layers from the MobileNetV1 pretrained on the ImageNet dataset.
conv_base = applications.MobileNet(weights = "imagenet", include_top=False, input_shape = (256, 256, 3))
Adding a GlobalMaxPooling2D layer and Dense layers, the model architecture looked like this:
Model: "sequential_8" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= mobilenet_1.00_224 (Model) (None, 8, 8, 1024) 3228864 _________________________________________________________________ global_average_pooling2d_6 ( (None, 1024) 0 _________________________________________________________________ dense_13 (Dense) (None, 1024) 1049600 _________________________________________________________________ dropout_25 (Dropout) (None, 1024) 0 _________________________________________________________________ dense_14 (Dense) (None, 20) 20500 ================================================================= Total params: 4,298,964 Trainable params: 4,277,076 Non-trainable params: 21,888
To avoid overfitting, Data Augmentation was applied with augmentation techniques including image rotation, linear shifts, zoom, illumination variation and flips.
from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale = 1./255, rotation_range=360,width_shift_range=0.2,height_shift_range=0.2, shear_range = 0.2,zoom_range = [0.5, 1.0],brightness_range = [0.2, 1.0],horizontal_flip = True,vertical_flip =False,zca_whitening=True, zca_epsilon=1e-06)training_set=train_datagen.flow_from_directory(path_training,target_size =(256,256),batch_size=64,class_mode='categorical',shuffle=True)
For initial epochs the transferred layers were frozen.
for layer in conv_base.layers:
layer.trainable=False
Initial training was done with Adam Optimizer with a decaying learning rate, followed by switching to SGD in the later epochs. After a plateau in validation loss was encountered, the transferred layers were unfrozen and training was resumed to further fine tune the model. A training accuracy of 91.34% and a validation accuracy of 85.51% was obtained.
Testing
imn = '/content/test/testimage.jpg' #test image path
img = load_img(imn, target_size=(256, 256))
img = np.asarray(img)
img = img.astype('float32')
img = img/255
img = np.expand_dims(img, axis=0)
img = img.reshape(1,256,256,3)
res = model2.predict(img)
ord = np.argsort(res)
ind = np.argmax(res)li = ['chicken curry', 'chicken wings', 'ch cake', 'cup cake', 'donuts', 'dumplings', 'fries','fried rice', 'garlic bread', 'hamburger', 'hot-dog', 'ice-cream', 'omelette', 'pizza', 'samosa', 'sandwich','soup','spring rolls', 'sushi', 'waffle']lis = []for i in range(0, 5):
lis.append(li[(ord[0][19 - i])])print(lis) # Top-5 predictions
A keras .h5 model can now be saved. The model occupies around 32.9 MB of space.
Conversion to TensorFlow lite
image_shape = (256, 256, 3)def representative_dataset_gen():
num_calibration_images = 10
for i in range(num_calibration_images):
image = tf.random.normal([1] + list(image_shape))
yield [image]converter=lite.TFLiteConverter.from_keras_model_file('model.h5')converter.default_ranges_stats=[0,255]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_genconverter.target_spec.supported_ops =[tf.lite.OpsSet.TFLITE_BUILTINS_INT8]converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
model = converter.convert()
file = open( 'model.tflite' , 'wb' )
file.write(model)
The resultant tflite model is of 16.29 MB, almost half the size of the h5 model.
Testing with tflite
imn = '/content/test/testimage.jpg'
img = load_img(imn, target_size=(256, 256))
img = np.asarray(img)
img = img.astype('float32')
img = img/255
img = np.expand_dims(img, axis=0)
img = img.reshape(1,256,256,3)interpreter = lite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], img)
interpreter.invoke()output_data = interpreter.get_tensor(output_details[0]['index'])
ord = np.argsort(output_data)
ind = np.argmax(output_data)lis = []
for i in range(0, 5):
lis.append(li[(ord[0][19 - i])])
print(lis)
And we are done with the TFlite!
Android
Clone the github repository from here.
To the assets directory of your application add the tflite model and a label.txt file with the labels. Here is how the txt file should look:
label 1
label 2
label 3
..
..
..
In the Kotlin file MainAct.kt edit the variabes mInputSize, mModelPath and mLabelPath according to your needs. We are considering a threshold score of 0.40 specified in the Classifier.kt, play around for the best results!
Here is a screenshot of the application: