[Week 4–5 Object Detection and Room Classification with Deep Learning]

Team Members: Ahmet Tarık KAYA, Ayça Meriç ÇELİK, Kaan MERSİN

Kaan Mersin

Follow

Published in

bbm406f18

7 min readJan 6, 2019

--

Since the beginning of our project, we have been working on classifying rooms by looking items in them. Our project has two stages. The first step includes scene parsing with deep learning and the second step includes classification of rooms using basic machine learning methods. In the past two weeks, we mainly worked on the second stage of our project.

Using the Dataset

_.txt: text file describing the content of each image (describing objects and parts). This information is redundant with other files. Each line in the text file contains: column 1=instance number, column 2=part level (0 for objects), column 3=occluded (1 for true), column 4=class name (parsed using wordnet), column 5=original raw name (might provide a more detailed categorization), column 6=comma separated attributes list.

In the dataset of ADE20K, each scene has 2 different image file that contains different segmentation results of the original image and the original RGB image. It also has a .txt file that we mentioned above. For our task, we used .txt file to train our base models.

Firstly, we write a snippet which traverses in the folders of different labels. Each folder contains images that are labeled with the folder name. So we traverse only in the room folders which are listed below:

bedroom -> 1389 images
bathroom -> 671 images
kitchen -> 652 images
living room ->697 images
dining room -> 412 images.

In each folder, we read all the .txt files and review their object information. We concluded that fifth columns(original row name) of files, contains the most significant labels of the object.

So we created a dataframe where each row entry stands for an image and columns contains object names. Also, we have an additional column for the room label as target values for our model. Each cell contains the frequency of an object in that segmented image.

Image 2. First Five Rows of our Dataframe

After creating our dataframe, we started to built different classifiers and compare their performances.

We used sklearn’s train_test_split method to create our train and test attributes and targets. We split the dataset with 8:2 train-test ratio.

First Approach: kNN Classifier

We used sklearn’s KNeighborsClassifier model.

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

The results were very motivating although it is not the most suitable approach to solve our problem. The minimum accuracy on test dataset was %85 on different parameters.

After getting the results, we mainly concentrated on tuning the parameters and tried to achieve the best performance on kNN classifier.

For different parameters, we always trained our model with the number of nearest neighbors in range of 2 to 10.

The first parameter we took into account was “weight” parameter.

We had two options:
- ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
- ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

The results are shown below:

Image 3. Test accuracies in different paramaters where columns are "weight" choices and indexes are different number of neighbors

As we see in this table, choosing ‘distance’ as our weight parameter increase our test accuracy a lot. So we decided to use it while tuning other parameters.

The second parameter tuned was “algorithm”.

Algorithm parameter is an optional parameter. The model detects the most suitable choice for the given dataset in default.

We had three options:
-‘ball_tree’ uses BallTree
-‘kd_tree’ uses KDTree
-‘brute’ uses a brute-force search.

The results are shown below:

Image 4. Test accuracies in different paramaters where columns are “algorithm” choices and Indexes are different number of neighbors

As we see in the table, the model selects ‘kd_tree’ as the most appropriate algorithm (The values in “auto” and “kd_tree” columns are equal). So we decided to use “kd_tree” while tuning other parameters.

The third parameter we took into account was leaf size.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

Changing leaf size didn’t affect much on the test accuracy as shown below:

So, we leave it as a default value.

The last parameter we tuned was distance.

metric : string or callable, default ‘minkowski’
the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.

There are three different distance choices whose formulas are given below:
-‘minkowski’ -> sum(|x — y|^p)^(1/p)
-‘manhattan’ -> sum(|x — y|)
-‘euclidean’ -> sqrt(sum((x — y)²))

The results are shown below:

Image 7. Test accuracies in different paramaters where columns are “metric” choices and indexes are different number of neighbors

As we see in this table, choosing ‘manhattan’ as our distance parameter increased our test accuracy a lot. So it became our final choice.

As a result, we got the maximum test accuracy as %92.67 with the parameters we tuned and 7 nearest neighbors.

The confusion matrix of test dataset are shown below:

The matrix shows that the most wrong-labeled room type is the living room with %73 accuracy. We tried to find what causes this misclassification in this particular room type.

We concluded that there are not so many significant objects which are only found in living rooms.

We can give examples of these significant objects from other rooms. If there are a refrigerator and an oven in a room, there is a high change for this room to be a kitchen. If there are a bed and wardrobe in a room, it probably will be a bedroom. It can also contain other items like curtains, armchairs, chairs, etc. However, living rooms generally contains general pieces of furniture such as armchairs, sofas, chairs, or decorative objects which can be found in many rooms. So it is just normal to find bedroom and dining room entries as similar rooms. The model just computes the distance between data points, so this misclassification is understandable. Lastly, the statistical data about objects are given below:

299 →# of common objects between living room and bedroom
284 →# of common objects between living room and dining room
499 → # of living room objects
498 → # of bedroom objects
374 → # of dining room objects

Second Approach: Naive Bayes

As our second model, we trained the Naive Bayes model.

In machine learning, naive Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.

We expected much more from this model and results didn’t disappoint us. The minimum accuracy on test dataset was %96.6 on different parameters.

There were not so many parameters to tune. The result is shown below:

Image 10. Test accuracies in different paramaters where columns are “fix_prior” choices and indexes are different values of "alpha"

The parameters that we worked on was alpha and fit_prior. As we see from the table, fit_prior parameter doesn’t change the result. But Laplace smoothing increases our test accuracy as we expected.

alpha : float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
fit_prior : boolean, optional (default=True)
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

As a result of our observations from changing the parameters in the Naive Bayes method, we decided to leave the parameters as default ones since they give the best accuracy.

The confusion matrix of test dataset is shown below:

As we see in results, the most wrong-labeled room type is the living room with %91 accuracy. We discussed and tried to find what causes this misclassification on this particular room type.

The reason for that is same as we discussed in kNN part of our blog.

Conclusion

In these past two weeks, we had a chance to get hands-on experience with our dataset and we were able to get promising results in different machine learning approaches. In the future, we will concentrate on the first part of our project which is scene parsing with deep learning.

Thank you for following our project.

Stay tuned for uptades !

References

[Week 4–5 Object Detection and Room Classification with Deep Learning]

Team Members: Ahmet Tarık KAYA, Ayça Meriç ÇELİK, Kaan MERSİN

Written by Kaan Mersin