Neural Networks performance on BabyAI Dataset (Part I)

Published in

Neural Networks Performance

4 min readAug 9, 2018

In this article performance of different neural network architectures on BabyAI Dataset, for classifying image of shapes into one of three(ellipse, rectangle and triangle) classes, will be explored. All codes for this serious are found on github repository.

The serious starts with shallow net of single hidden layer and continue to increase both the depth and complexity of the network. The article explores fully connected layer nets, convnets and capsnets.

This article being the first of the serious, mainly focuses on downloading the scripts to generate the dataset, loading the dataset and training simple logistic regression network on the dataset.

Prerequisite

This article is not about teaching any of libraries in machine learning area nor it is about any programming language, rather the aim of this article is to explore performance of different neural network architectures. For this article it is assumed some background on python programming language, machine learning basics, basics on artificial neural networks, and keras, tensorflow and numpy libraries.

BabyAI Dataset

BabyAI dataset is dataset for classification. More information on how to obtain and extract the dataset is found here. The downloaded material consists sets of scripts to generate shapes of three classes(rectangle, ellipse and triangle). The dataset for this series can be generated using (shapeset2_1cspo_2_3.10000.train.py, shapeset2_1cspo_2_3.5000.test.py and shapeset2_1cspo_2_3.5000.valid.py) scripts which produce shapes with arbitrary size, position, color and orientations. In this article the following commands are used to generate 10,000 training, 5,000 test and 5,000 validation sets.

python shapeset2_1cspo_2_3.10000.train.py write_formats amat
python shapeset2_1cspo_2_3.5000.test.py write_formats amat
python shapeset2_1cspo_2_3.5000.valid.py write_formats amat

The above commands produce three .amat files. For convenience purpose let’s rename these files to train.amat, test.amat and valid.amat for train sets, test sets and validation sets respectively, and put these files into certain folder.

Loading and preprocessing the dataset file in python

The dataset is now in csv format. There are different options to load csv files. In this article pandas library is used to load csv files.

train = pd.read_csv(os.path.join(dataset_dir,"train.amat"), sep=" ", dtype=np.float32, skiprows=1, header=None)train_set = train.values

The loaded dataset contains array with the first 1024 columns are gray tone of the image, and next seven columns are:- shape type, background color, color of the shape, x and y coordinate of the centriod of the shape, rotation angle of the shape, size of the shape and elongation of the shape. More information about the dataset is found here. Since the objective function of this article is classification of image into one of three shape classes, we will use only image and shape type information.

images_train,shapes_train = train_set[:,:1024],train_set[:,1024]

Model

Keras gives two types of model to build neural network :- Sequential model and Functional API model. Functional API model gives more flexibility compared to Sequential model, but using Sequential model is easy. For this serious we will use Sequential model.

def get_fullconnected_shallow_softmax(args):
    input_shape = (32*32,)
    model = Sequential()
    model.add(Dense(2048,input_shape=input_shape,activation="relu"))
    model.add(Dense(3,activation="softmax"))
    model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(args.lr),metrics=["accuracy"])    return model

Figure 1. Shallow network with single hidden layer.

Training setup

Training setup for all networks is constant, with batch size 64, learning rate 0.0001 and epochs 40. For sake of robust evaluation of model average of 5 accuracies are taken as measure of model’s performance.

Result

The only property that could be experimented here is number of neurons for the hidden layer. The following table depicts the result of the different number of hidden layer neurons and respective accuracies and losses.

+----------+-----------+-----------+-----------+------------+
| Neurons  | Train ACC | Train Loss| Valid Acc | Valid Loss |
+----------+-----------+-----------+-----------+------------+
|   128    | ~62.53%   | ~0.8506   | ~56.13%   | ~0.9469    |
+----------+-----------+-----------+-----------+------------+
|   256    | ~66.04%   | ~0.8001   | ~58.43%   | ~0.9209    |
+----------+-----------+-----------+-----------+------------+
|   512    | ~67.37%   | ~0.7581   | ~58.02%   | ~0.9196    |
+----------+-----------+-----------+-----------+------------+
|  1024    | ~71.64%   | ~0.6872   | ~60.74%   | ~0.891     |
+----------+-----------+-----------+-----------+------------+
|  2048    | ~70.51%   | ~0.6914   | ~58.61%   | ~0.9357    |
+----------+-----------+-----------+-----------+------------+
|  4096    | ~71.11%   | ~0.6678   | ~58.14%   | ~0.9632    |
+----------+-----------+-----------+-----------+------------+
|  8192    | ~73.63%   | ~0.6153   | ~59.16%   | ~0.9642    |
+----------+-----------+-----------+-----------+------------+

Conclusion

As it can be seen from the above table, as the number of neurons increase, training loss decreases and training accuracy increases. But increasing number of neurons above 1024 increases validation loss and decreases validation accuracy. This is because the network quickly over-fits if number of neurons of hidden layer is more than 1024. Coming parts of this series will cover ways of overcoming over-fitting.