[Week-6 Clean / Messy Rooms Detection]

Published in

bbm406f18

4 min readJan 6, 2019

Team Members : Damla Ünal , Didem Yanıktepe , Atakan Erdoğdu , Zekeriya Onur Yakışkan

This week we are now close to the end of our project. We talked about multi-label classification in our blog post last week. This week we have made applications for our project and created a good architecture.
In this week’s post, we’ve talked about how we created this architecture, what results we get. As we mentioned in the first blog posts, our goal in the project was to determine whether the room was clean or messy according to room types.

In the first phase of the project we have created an architecture to find room types. We talked about the details of this architecture in our blog post 4th weeks. We didn’t get very high accuracy with this architecture. That’s why our newly created architecture is a simplified type of VGGnet. This model (VGGnet) was first introduced by Simonyan and Zisserman in 2014. (https://arxiv.org/pdf/1409.1556/ ). We’ve given it the name MiniVggnet, making it simplified in VGGnet, and we’ve customized it a bit.

Vggnet-like architects have certain characteristics:

3x3 convolutional layers are used with the increase in depth.
Making smaller capacity extent by max pooling
There are fully-connected layers immediately before the softmax classifier.

We created this network model using Keras. We worked with input images of 96 x 96 with a depth of 3. As you know, we have 3 main classes, Bedroom, Kitchen, Livingroom, and a binary class that decides these rooms are Clean or Messy. So our dataset folders are clean_bedroom, clean_kitchen, clean_livingroom, messy_bedroom, messy_kitchen, messy_livingroom, 6 pieces.

In our model, our first block is Conv →ReLU (activation) →Pooling :

Our first convolution layer has 3 x 3 filter size with 32 filters. We use the activation function ReLU, then we do batch normalization, max pooling, and 25% dropout. A dropout process is a method that is applied to reduce a possible overfitting condition. In this block we use max pooling with 3x3 pooling size. Input sizes were reduced to 32x32 by applying a 3x3 max pooling while 96x96. We used more than one Conv and ReLU layer together to learn more and more various features before shrinking the spatial dimensions of the capacity : (Conv →ReLU (activation) * 2 →Pooling

We are further deepening our network to learn more filters and to further reduce the spatial dimensions of the volume. So we used 3x3 filter size with 64 filters, and we decreased pooling size from 3 x 3 to 2 x 2. We did this so that we did not reduce our spatial dimensions very quickly.

In our other multiple block (Conv → ReLU (activation) * 2 → Pooling we used 128 filters with 3x3 filter size. Again we applied a max pooling of 2x2 dimensions and finally used a 25% dropout to reduce overfitting.

Finally, last block is FullyConnected → ReLu layers. In this last block, after being flattened, it was dense by 1024. Generally, dropout is applied between 40–50% in fully connected layers, while 10–25% dropout is made in other intermediate layers and we used 50% dropout in FC layer. As we mentioned in our previous block, we used sigmoid for activation of the final layer for multi-label classification. This is very important when multi-label classification with Keras.

We created our model using this architecture. Below is a model we have trained with 50 epoch, 0.001 learning rate and 32 batch size.