Nishant Nikhil
2 min readApr 16, 2017

--

Paper summaryCharacter-level Convolutional Networks for Text Classification

Character-level Convolutional Networks for Text Classification — https://arxiv.org/abs/1509.01626
Written by Xiang Zhang, Junbo Zhao and Yann LeCun.

Design:

  1. Fixed number of characters
  2. 1D CNN over it (with non-linearity)
  3. k-max pooling
  4. 1D CNN over it (with non-linearity)
  5. k-max pooling
  6. 1D CNN over it (with non-linearity) * 4
  7. k-max pooling
  8. Dense layer with Dropout (with non-linearity)* 2
  9. Dense layer with softmax

(use k=3)

Characters are sent to the network, each character is represented by a one hot encoding.
Example if the vocabulary was just: `{‘a’, ‘b’, ‘c’}`:
The vectors would have been:

‘a’-{0,0,1}
‘b’-{0,1,0}
‘c’-{1,0,0}

And as “Fixed number of characters” are sent: so we basically pad shorter sentences with 0’s and chop bigger sentences.
So if that fixed number of characters was 5, then:

DragonBall → Drago
You → You00

Now when these vectors go through CNN some specific features are checked, if the features are present the value of convolution is high and it is picked up using max-pooling operation, the k-max pooling allows the author to get a deeper network. (For CNN’s intuition see this)

After which dense networks with dropout(to prevent over-fitting) are applied. And finally a dense network with units = number of training classes is used. (For dropout see this)

Furthermore they have used thesaurus for data augmentation.

Other tested models are -
1. Bag-of-words and its TFIDF
2. Bag-of-ngrams and its TFIDF
3. Bag-of-means on word embedding
4. Word-based ConvNets
5. Long-short term memory

Conclusions:

The most important conclusion from our experiments is that character-level ConvNets could work for text classification without the need for words. This is a strong indication that language could also be thought of as a signal no different from any other kind.

Simple NLP models (ngrams TFIDF) perform better than DL models on smaller dataset.

C0nv-nets are good for user generated data (mis-spelled words are handled easily)

two kinds of tasks: sentiment analysis (Yelp and Amazon reviews) and topic classification (all others). This dichotomy in task semantics does not seem to play a role in deciding which method is better.

And TIL the meaning of this phrase:

There is no free lunch. Our experiments once again verifies that there is not a single machine learning model that can work for all kinds of datasets

Disappointment : They do not show the results for a character level LSTM!

--

--

Nishant Nikhil

Learner | Applied Scientist @amazonIN | prev @IITKGP @GSoC @eccvconf EMNLP’18