WEEK V-BOOK GENRE PREDICTOR

Hakan Akyürek
bbm406f18
Published in
2 min readDec 31, 2018

Theme: Multi-label text classificiation

Team members: Hakan AKYÜREK, Sefa YURTSEVEN

A book.

This week we started to develop our multi-label models. We are using skmultilearn library to try out different models to evaluate our NN model better. We are also building a multi-label NN model at the same time.

https://github.com/scikit-multilearn/scikit-multilearn

We haven’t got any actual results with the library for now, but we found out that the library classifiers we have tried require same number of features for all train and test data. Since it is most unlikely to achieve this with bag of words model, we are going to use a doc2vec model, a different way to represent texts in numeric form. We also hope that representing our textual data in a different and better manner, we will achieve better results with our models.

To understand what doc2vec is, one needs to understand what word2vec is, since doc2vec heavily depends of word2vec. So to keep this simple I strongly recommend you to check this, it is a really nice introduction post for doc2vec.

So, what did we do other than researching this week?

We have started developing a multi-label NN model. We were using softmax the final activation function for our NN model, but since this is a multi-label classification problem and each layer needs to be independent from each other, we use sigmoid as the activation function of output layer. With this we had to change loss function to binary-crossentropy as well.

To evaluate this we declared two hyper-parameters: Threshold and hit rate. The evaluation simply goes as: The number labels with higher probability than threshold needs to be equal or higher than the hit rate. So, we can say to model that it needs to predict at least x number of labels correctly. We still try to make evaluation better and trying different stuff as it is in researching phase.

--

--