Text Classification using Neural Networks

Published in

Holler Developers

6 min readJan 21, 2022

INTRODUCTION

Text classification is a machine learning technique that assigns a set of predefined categories to open-ended text. Text classification is one of the fundamental tasks in natural language processing. Popular examples of text classification include spam classification, sentiment classification, news categorization, chat conversations can be classified into categories, etc.

In this article, we focus on the problem of category classification using word embeddings. The main goal of this article is to explain how Neural Networks work internally. We develop our solution in Python using pandas, TensorFlow Keras, and scikit-learn libraries.

DATASET

We have developed a dataset using publicly available datasets for different categories like Food, Music, Games, Watch Next, Politics, etc.

NEURAL NETWORKS

The basic idea behind a neural network is to simulate lots of densely interconnected brain cells inside a computer so you can get it to learn things, recognize patterns, and make decisions in a human-like way. Neural networks are the usual representation we make of the brain: neurons interconnected to other neurons which form a network. The operation of a neural network is straightforward: one enter variables as inputs (for example an image if the neural network is supposed to tell what is on an image), and after some calculations, the output is returned (following the first example, giving an image of a cat should return the word “cat”).

What does a neural network consist of?

A typical neural network has anything from a few dozen to hundreds, thousands or even millions of artificial neurons arranged in a series of layers, each of which connects to the layers on either side. The first layer is the layer in which inputs are entered, which are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognize, or otherwise process. Other neurons sit on the opposite side of the network and signal how it responds to the information it’s learned; this layer is known as the output layer. In between the input and output layers, are one or more layers of hidden units, which, together, form the majority of the artificial brain. These layers are called hidden layers. Most neural networks are fully connected, which means each hidden unit and each output unit are connected to every unit in the layers on either side. The connections between one unit and another are represented by a number called weight, which can be either positive or negative. The higher the weight, the more influence one unit has on another.

Although a simple neural network for simple problem solving could consist of just four layers, as illustrated here, it could also consist of many different layers between the input and the output. A richer structure like this is called a deep neural network (DNN), and it’s typically used for tackling much more complex problems. In theory, a DNN can map any kind of input to any kind of output, but the drawback is that it needs considerably more training: it needs to “see” millions or billions of examples compared to perhaps the hundreds or thousands that a simpler network might need.

What does a neuron do

First, the neuron adds up the value of every neuron from the previous layer it is connected to. In the above figure, there are 3 inputs (x1, x2, x3) coming to the neuron, so 3 neurons of the previous column are connected to our neuron.

This value is multiplied, before being added, by another variable called “weight” (w1, w2, w3) which determines the connection between the two neurons. Each connection of neurons has its own weight, and those are the only values that will be modified during the learning process.

Moreover, a bias value may be added to the total value calculated. It is not a value coming from a specific neuron and is chosen before the learning phase, but can be useful for the network.

After all those summations, the neuron finally applies a function called “activation function” to the obtained value. Sigmoid, Tanh, RELU, Leaky RELU, Softmax are some examples of activation functions. Now, the neuron is ready to send its new value to other neurons.

How does a neural network learn things?

Information flows through a neural network in two ways. When it’s learning (being trained) or operating normally (after being trained), patterns of information are fed into the network via the input units, which trigger the layers of hidden units, and these in turn arrive at the output units. This common design is called a feedforward network. Not all units “fire” all the time. Each unit receives inputs from the units to its left, and the inputs are multiplied by the weights of the connections they travel along. Every unit adds up all the inputs it receives in this way and (in the simplest type of network) if the sum is more than a certain threshold value, the unit “fires” and triggers the units it’s connected to.

For a neural network to learn, there has to be an element of feedback involved. Neural networks learn things typically by a feedback process called backpropagation. This involves comparing the output a network produces with the output it was meant to produce, and using the difference between them to modify the weights of the connections between the units in the network, working from the output units through the hidden units to the input units — going backward, in other words. In time, backpropagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide, so the network figures things out exactly as it should.

CODE

# Import all required libraries
import pandas as pd
import numpy as np
from string import punctuation
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import *
from keras.layers.convolutional import *
from keras.preprocessing import text, sequence
from keras import *
from keras.utils import to_categorical
import gensim
from gensim.models import Word2Vec
from sklearn import *
from sklearn.feature_extraction.text import *
from sklearn.metrics import *
import nltk
nltk.download(‘punkt’)

Here we will be using Google’s Word2Vec for training the neural networks. So we are going to initialize the Word2Vec model and write helper functions that will help us in extracting embedding from the text. Here we are using 300 dimension vectors which will be the input shape to the model.

input_dim = 300wv = gensim.models.KeyedVectors.load_word2vec_format(“path_to_file/GoogleNews-vectors-negative300.bin.gz”, binary=True)
wv.init_sims(replace=True)def tokenize_text(text):
    tokens = nltk.word_tokenize(text, language=’english’)
    return tokensdef word_averaging(words):
    mean = []
    for word in words:
        if isinstance(word, np.ndarray):
            mean.append(word)
        elif word in wv.vocab:
            mean.append(wv.syn0norm[wv.vocab[word].index])
    if not mean:
        return np.zeros(wv.vector_size,)
    mean = np.array(mean).mean(axis=0)
    return mean

We need to load the data and do training and testing data splits. The variable out_dim will be equal to the number of class labels we will be having in our data.

out_dim = #some value based on the use case# LOAD INPUT DATA WITH TWO COLUMNS — ‘TEXT’ AND ‘CATEGORY'
path_to_input_data = ""
data = pd.read_csv(path_to_input_data)target_names = list(set(data.category.tolist()))data_train, data_test = model_selection.train_test_split(data, test_size=0.1, random_state=42, stratify = data.category)

We will be using LabelEncoder to encode the target variable and create mappings for each class label in our data.

encoder = preprocessing.LabelEncoder()
y_data = encoder.fit_transform(data_train.category)
y_test_data = encoder_nc.fit_transform(data_test.category)mapping = dict(zip(encoder.classes_, encoder.transform(encode.classes_)))
print(mapping)

Below is the code for extracting features, training a model, testing it on test data, and getting the performance metrics like Precision, Recall, and F-score.

# generate 2d classification dataset
X = data_train.text
y = y_datatrain_tokenized = X.apply(lambda r: tokenize_text(r)).values
X_train_word_average = word_averaging_list(wv, train_tokenized)test_tokenized = data_test.text.apply(lambda r: tokenize_text(r)).values
X_test_word_average_nc = word_averaging_list(wv,test_tokenized_nc)
train_labels = to_categorical(y)# Training a sequential model
model = Sequential()
model.add(Dense(300, input_dim=input_dim, activation=’relu’))
model.add(Dense(100, activation=’relu’))
model.add(Dense(out_dim, activation=’softmax’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’)
model.fit(X_train_word_average, train_labels, epochs=10, verbose=0)score = model.predict_classes(X_test_word_average)
print(‘Test accuracy:’, score)
print(‘accuracy %s’ % accuracy_score(score, y_test_data))print(classification_report(y_test_data, score, target_names = target_names))

CONCLUSION

In this article, I went through a detailed explanation of how neural networks work under the hood using mathematical techniques. Knowing the nuts and bolts will fortify your neural networks knowledge and make you feel comfortable taking on more complex models.