Training YOLO with keras

Ashish Gusain
Analytics Vidhya
Published in
5 min readJul 9, 2020

YOLO is widely gaining popularity for performing object detection due to its fast speed and ability to detect objects in real time. Now, training custom detection is possible and it can be performed as given in their official github link. But, since not many individuals want to train it directly without even knowing what they are doing. So, we are going to use keras for this purpose.

If you directly want to see the implementation, switch to https://github.com/AshishGusain17/via_google_colab/blob/master/keras_yolo.ipynb

Firstly, we will go through preparing our data-set. I am going to detect whether the footrest of a motor-cycle is closed or not. I have all my images stored in a folder named ‘myimages’. Since these images are of different sizes. Let’s get this fixed. Below is a small script which will convert all images with dimensions (624*832*3) and store in ‘small_images’ folder.

import os
import cv2
def prep_small_images():
myimages = os.path.join(os.getcwd(),”myimages”)
print(myimages)
for img in os.listdir(myimages):
image_path = cv2.imread(os.path.join(myimages,img))
image_path = cv2.resize(image_path,(624,832))
print(image_path.shape)
cv2.imwrite(os.path.join(os.getcwd(),"small_images",img),
image_path)
prep_small_images():

Now, we have to label all the images with the footrest closed. I am just performing this with one single class i.e. ‘closed’. You can work with multiple classes for your work. Download labellmg which is a toolkit to label images.

Click Open Dir and select ‘small_images’ folder. Their are two options to label images PascalVOC and YOLO, choose YOLO. Click Create RectBox and draw a box to your object. Enter the label to your object. Do this for all the images.
Now, their will be various txt files created for each image with the bounding box information. Run the below script to prepare one single txt file for all your annotations along with the path to your images.

def prep_train_txt():
width = 624
height = 832
small_images = os.path.join(os.getcwd(),”small_images”)
file_object = open(‘train.txt’, ‘a’)
for img in os.listdir(small_images):
if img[-3:] == ‘txt’:
with open(os.path.join(small_images,img)) as f:
lines = f.readlines()
ans = lines[0]
line = ans.split(“ “)
x1 , y1 , w , h = float(line[1]) ,
float(line[2]) ,
float(line[3]) ,
float(line[4][:-1])
x1 , y1 , w , h = x1 — w/2 , y1-h/2 , w , h
x1 , y1 , w , h = int(x1*width) ,
int(y1*height) ,
int(w*width) ,
int(h*height)
nameee = img[:-3] + “jpg”
img_name = os.path.join(“small_images” , nameee)
img_name = cv2.imread(img_name)
text = “small_images/” + nameee + “ “ + str(x1) +
“,” + str(y1) + “,” + str(x1+w) + “,” +
str(y1+h) + “,” + str(0) + “\n”
file_object.write(text)
prep_train_txt()

Remember, I am having just one bounding box in each image. If you have multiple boxes, some changes may be required in the above script.

Also a seperate txt file would be created while you finish object annotations, which will have all the label names. Rename it to ‘my_classes.txt’. With this all our files and data are completely ready.

Requirements from now onwards:
# Keras 2.1.5
# tensorflow 1.6.0
Implementing the complete model from scratch is a bit hectic and time-consuming procedure, so we will be using an already created yolo model in keras. The original weights can be downloaded from this link. Store these weights inside keras-yolo3 folder. Now, run the below command. This will create the model architecture and you can see the complete model summary.

git clone https://github.com/qqwweee/keras-yolo3
cd keras-yolo3
python convert.py yolov3.cfg yolov3.weights model_data/yolo_weights.h5

Their are some functions that will be utilized according. Just copy them. No changes are required in these.

import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
from yolo3.utils import get_random_data


def get_classes(classes_path):
with open(classes_path) as f:
class_names = f.readlines()
class_names = [c.strip() for c in class_names]
return class_names

def get_anchors(anchors_path):
with open(anchors_path) as f:
anchors = f.readline()
anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)

def create_model(input_shape, anchors, num_classes,
load_pretrained=True, freeze_body=2,
weights_path='model_data/yolo_weights.h5'):
K.clear_session() # get a new session
image_input = Input(shape=(None, None, 3))
h, w = input_shape
num_anchors = len(anchors)

y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
num_anchors//3, num_classes+5)) for l in range(3)]

model_body = yolo_body(image_input, num_anchors//3, num_classes)
print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))

if load_pretrained:
model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
print('Load weights {}.'.format(weights_path))
if freeze_body in [1, 2]:
# Freeze darknet53 body or freeze all but 3 output layers.
num = (185, len(model_body.layers)-3)[freeze_body-1]
for i in range(num): model_body.layers[i].trainable = False
print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))

model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
[*model_body.output, *y_true])
model = Model([model_body.input, *y_true], model_loss)

return model




def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
'''data generator for fit_generator'''
n = len(annotation_lines)
i = 0
while True:
image_data = []
box_data = []
for b in range(batch_size):
if i==0:
np.random.shuffle(annotation_lines)
image, box = get_random_data(annotation_lines[i], input_shape, random=True)
image_data.append(image)
box_data.append(box)
i = (i+1) % n
image_data = np.array(image_data)
box_data = np.array(box_data)
y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
yield [image_data, *y_true], np.zeros(batch_size)

def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
n = len(annotation_lines)
if n==0 or batch_size<=0: return None
return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)

Now run the below command and make sure all the file locations are provided correctly. annotation_path will have the file location of train.txt file containing all the annotations. log_dir will contain model after it is trained. classes_path contain path to the txt file containing label names. anchors_path has anchors that will be used while training.

annotation_path = 'train.txt'
log_dir = 'logs/000/'
classes_path = 'my_classes.txt'
anchors_path = 'model_data/yolo_anchors.txt'
class_names = get_classes(classes_path)
num_classes = len(class_names)
anchors = get_anchors(anchors_path)

input_shape = (416,416) # multiple of 32, hw

model = create_model(input_shape, anchors, num_classes,freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze

logging = TensorBoard(log_dir=log_dir)
checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

val_split = 0.1
with open(annotation_path) as f:
lines = f.readlines()
np.random.seed(10101)
np.random.shuffle(lines)
np.random.seed(None)
num_val = int(len(lines)*val_split)
num_train = len(lines) - num_val

With this, we are all set to go. Now, we can compile and train our model.

model.compile(optimizer=Adam(lr=1e-3), loss={
# use custom yolo_loss Lambda layer.
'yolo_loss': lambda y_true, y_pred: y_pred})

batch_size = 32
print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))

model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
steps_per_epoch=max(1, num_train//batch_size),
validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
validation_steps=max(1, num_val//batch_size),
epochs=50,
initial_epoch=0,
callbacks=[logging, checkpoint])

model.save_weights(log_dir + 'trained_weights_stage_1.h5')

Different models are stored in logs folder after every 3 epochs. You can use any of them for prediction purpose.

Now, for prediction, run below command and you will get different bounding boxes results. Apply Non-Maximal Suppression (NMS) and you can detect the objects.

python yolo_video.py  --image --input="/content/img1.jpg"

The implementation can be seen here.

You can reach me via mail , linkedIn , github.

--

--

Ashish Gusain
Analytics Vidhya

Full Stack Developer | MERN Stack | Data Science | ML