Age Detection of Indian Actors using Deep Learning
Problem Statement
Indian Movie Face database (IMFDB) is a large unconstrained face database consisting of 34512 images of 100 Indian actors collected from more than 100 videos. The task is to predict the age of a person from his or her facial attributes. The problem has been converted to a multi-class problem with classes as Young, Middle and Old. (This is a competitive problem from Analytics Vidhya)
Dataset
The dataset is cleaned and formatted to give you a total of 26742 images with 19906 images in train and 6636 images in test.
The attributes of data are as follows:
ID — Unique ID of image
Class — Age bin of person in image
Loading data
We will use keras as it is user friendly and is similar to scikit-learn.
#---- Reading file
train_csv = pd.read_csv("train.csv")
train_csv["Class"].unique()train_csv.head()
Now as all class images are mixed, separating on basis of 3 classes, i.e YOUNG, MIDDLE, OLD. We are storing all 3 class images in 3 different folders. Shutil method from python helps to do so.
#--- Separating images in 3 classes
for index, row in train_csv.iterrows():
shutil.copy2("Train/"+row["ID"],row["Class"])
Now the below image shows the 3 classes from train dataset.
##inspecting the distribution of classesplt.figure(figsize = (16,6))
plt.style.use("fivethirtyeight")
train_csv['Class'].value_counts(dropna = False).plot(kind = 'bar',grid = True)
plt.title("Distribtuion of class counts")
plt.xticks(rotation = 0)
From the above graph we can say that there are more actors of middle aged and least with old aged.
To classify we will use the below techniques:
- Basic Convolutional neural network (CNN)
- Resnet50
Preparing data for train-test
We will use ImageDataGenerator API from keras to split data, here images directly from folders. flow_from_directory method helps to do so.Also ImageDataGenerator helps in augmentation.
Image augmentation artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc.
Image augmentation improves model performance very well, as test images may not be same as in trained images.
batch_size=32# To handle image loading problem
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = Truetrain_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)test_datagen = ImageDataGenerator(rescale = 1./255)training_set =
train_datagen.flow_from_directory('dataset/training',
target_size = (64, 64),
batch_size = batch_size,
class_mode = 'categorical')validation_set = test_datagen.flow_from_directory('dataset/validation_set',
target_size = (64, 64),
batch_size = batch_size,
class_mode = 'categorical')print(training_set.class_indices)
Basic CNN
In neural networks, Convolutional neural network (ConvNets or CNNs) is one of the main categories to do images recognition, images classifications. Objects detections, recognition faces etc., are some of the areas where CNNs are widely used.
Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernal
The Keras library in Python makes it pretty simple to build a CNN. We have used sequential model from keras as the sequential API allows us to create models layer-by-layer for most problems.
Also we have used max pooling, batch normalization, dropouts and a simple hack i.e leaky relu as activation.
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape = (64, 64, 3)))
model.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3)))
model.add(LeakyReLU(alpha=0.3))model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size = (2, 2)))# Dropout
model.add(Dropout(0.4))#Max Poolingmodel.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(LeakyReLU(alpha=0.3))
model.add(BatchNormalization())# Dropout
model.add(Dropout(0.3))#Flatten
model.add(Flatten())
model.add(Dense(128))
model.add(LeakyReLU(alpha=0.3))
model.add(Dense(64))model.add(Dropout(0.5))
model.add(Dense(3, activation = 'softmax'))
We have used accuracy as metrics and adam as optimizer. We run the model for 30 epochs.
# compiling the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])#fitting model
history = model.fit_generator(training_set,
steps_per_epoch = len(training_set),
epochs = 30,
validation_data = validation_set,
validation_steps = len(validation_set),
callbacks=[tensorboard1],
verbose=2)
Our accuracy is 77% and loss is 0.55 on basic CNN sequential mode. From the above graphs we can say that increasing epoch size may reduce loss further.
But with this loss also my competitive score was 0.77 and rank 149 at that time. The main trick here was using leaky relu as activation but with layers like max-pooling and batch normalization also helped to increase accuracy.
Batch normalization reduces the amount by what the hidden unit values shift around (covariance shift) and Max pooling uses the maximum value from each cluster of neurons at the prior layer.
ResNet50
We will now use transfer learning model i.e. ResNet50 API from keras.
ResNet, short for Residual Networks is a classic neural network used as a backbone for many computer vision tasks. Prior to ResNet training very deep neural networks was difficult due to the problem of vanishing gradients. ResNet first introduced the concept of skip connection. The diagram below illustrates skip connection.
Getting the Resnet50 model from keras having pretrained on imagenet data. So we dont change their weights and we cut the last layer as we have to classify on 3 classes. We keep image size as 64x64 with RGB channel.
# loading resnet model
Rsnt_model = ResNet50(weights='imagenet', include_top=False, input_shape=(64, 64, 3))
The main trick here is to fine tune last layers, i.e. output from resnet model. We do not directly connect dense layer to the resnet output. Instead we add some layers like GlobalAveragePooling2D, BatchNormalization. Here also batch normalization helps to scale the output came from above layers. Also dropout is always helpful to reduce connections.
av1 = GlobalAveragePooling2D()(Rsnt_model.output)
fc1 = Dense(256, activation='relu')(av1)drp1=Dropout(0.35)(fc1)
fc2 = Dense(128, activation='relu')(drp1)
drp2=Dropout(0.4)(fc2)
bat_norm=BatchNormalization()(drp2)
fc3 = Dense(68, activation='relu')(bat_norm)
drp3=Dropout(0.25)(fc3)
fc4 = Dense(34, activation='relu')(drp3)out = Dense(3, activation='softmax')(fc3)tl_model = Model(inputs=Rsnt_model.input,outputs=out)
tl_model.summary()
We have used accuracy as metrics and adam as optimizer. We run the model for 10 epochs.
# compiling the model
tl_model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])# fitting the model
history = tl_model.fit_generator(training_set,
steps_per_epoch = len(training_set),
epochs = 10,
validation_data = validation_set,
validation_steps = len(validation_set),
callbacks=[tensorboard3],
verbose =2)
Our accuracy is 96% and loss is 0.11 on Resnet model with last layer tuning .
But with this loss my competitive score was 0.847 and rank 45 at that time. Here also trick was using layers like pooling and batch normalization also helped to increase accuracy. From the graph we can say that model is slightly overfitting, but this can be improved with changing dropout rates in between dense layers. Here we have observed that we need less epochs to get this accuracy and loss, infact we can see our loss is even reduced to less than 0.1 between 4th and 6th epoch.
All model Comparison :
from prettytable import PrettyTable
x = PrettyTable()x.field_names = ["Model", "Loss", "train Accuracy", "Validation Accuracy","epochs"]x.add_row(["Basic CNN", 0.55, 75, 71.5,30])
x.add_row(["VGG19", 0.71, 68, 58.4,10])
x.add_row(["Resnet V1", 0.186, 93, 80,30])
x.add_row(["Resnet V2", 0.114, 96, 81,10])
x.add_row(["Inceptionnet", 0.333, 86, 76,25])print(x)
Here we can see that transfer learning models like VGG19 didn't performed well as it is deep network of 19 layers without any skipping layers which Resnet has an advantage over it. So Resnet performed well.
But in Resnet also there is difference in epochs, i.e. in V1 30 epochs were used and in V2 only 10, hence it shows how tuning last layers change performance, loss and accuracy drastically.
Conclusion
Resnet Model has performed better among all. Tuning of Last layers is very important in transfer learning to get efficient results in less time, like in above Resnet V1 took 30 epochs wheres as V2 took only 10 epochs to get better accuracy. Layers like pooling and Batch normalization plays an important role in reducing the loss.
So increasing number of epochs may increase chances of overfitting but with tuning last layers of transfer learning models we can reduce overfitting with less epochs.
Analytics Vidhya Rank
Future Work
- Changing dropout rates in between Dense layers may decrease loss further.
- More rigorous Image augmentation on training images.
- Experimenting with Dense layers size and adjusting learning rate with different optimizers.
Resources:
Original Dataset and competition rules :
https://datahack.analyticsvidhya.com/contest/practice-problem-age-detection/
Thank you for your attention and reading my work
If you liked this story, share it with your friends and colleagues !
Also, follow me on