Recurrent Neural Networks :An intuitive approach Part 1

Niketh Narasimhan
8 min readJul 31, 2020

--

One of the most fascinating aspects of humans is our ability to communicate with each other , we can express anything we want through our use of languages , we can express happiness , sadness , anger and a whole plethora of emotions!! ,

Can we train machines to communicate like us?

Can machines decipher and understand our language ?

Can machines be able to understand the various emotions that we human beings express through our language?

When we search for Wayne Rooney , how does the machine recognize he is a footballer ?

How does a machine differentiate between Amazon the river and Amazon the company?

Well you will be surprised that only sky is the limit ! when it comes to what machines can do and can’t do!!!

Actually not!!! Someone else may argue and point out that

Machines have broken the sky barrier with satellites orbiting the moon and other planets !!!

which brings us to a concept of “context”.

“sky is the limit” is a well known phrase and in the first context it gives a different meaning and in the second “context” it is taken literally for its meaning.

How does a machine recognize the difference?

Let us decode!!!

Before we jump into what are Recurrent Neural Networks (RNN)

Let us refresh some basic concepts and terminologies

Natural Language Processing(NLP):

Natural Language Processing is the technology used to aid computers to understand the human’s natural language.

The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable.

Most NLP techniques rely on machine learning to derive meaning from human languages.

Most of us are aware of Amazon Alexa, also known simply as Alexa, is a virtual assistant AI technology developed by Amazon,

What do we think can be the steps going on in the background , when we ask Alexa to play a song?

In the most layman terms they can be explained as

1. A human talks to the machine

2. The machine captures the audio

3. Audio to text conversion takes place

4. Processing of the text’s data

5. Deciphering the meaning of the audio.

5. Data to audio conversion takes place

6. The machine responds to the human by playing the audio file

As we go more in depth we will try and understand the technical aspects of the above mentioned steps

What is NLP used for?

Natural Language Processing is the technique behind the following common applications:

  • Language translation applications such as Google Translate
  • Word Processors such as Microsoft Word and Grammarly that employ NLP to check grammatical accuracy of texts.
  • Interactive Voice Response (IVR) applications used in call centers to respond to certain users’ requests.
  • Personal assistant applications such as OK Google, Siri, Cortana, and Alexa.
  • Customer feedback evaluation typically known as Sentiment Analysis.
  • Spam detection with regards to mails and to detect abusive posts in Twitter , facebook etc.

To name a few , Well the applications are so many that it is hard to sum up all of them here!!!

Difficulties/Challenges in NLP:

Languages and emotions can be complex and most of the time we understand each other through context and environment. It is very hard for us to train the subtle nuances and difference in accents, contexts usage of puns, metaphors and different figures of speech

for example

Call me Niketh vs Call me , Niketh

These two sentences have a different meaning and in audio form it is difficult to decipher for the machine.

As we can all imagine there are going to be a ton of such different situations that we encounter on a daily basis.

We will try and cover a few common ones

Different Techniques used in Natural Language Processing:

Breaking the sentence/sentence boundary disambiguation,

This process deals with deciding where sentences start and begin , to identify paragraphs , different punctuations such as question marks , fullstops and exclamations. This breaking process is no longer difficult to achieve, but is nonetheless, a critical process, especially in the case of highly unstructured data that includes structured information.

Tagging the parts of speech (POS) and generating dependency graphs

It is generally called POS tagging. In simple words, we can say that POS tagging is a task of labelling each word in a sentence with its appropriate part of speech. We already know that parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories.

We generally use a dictionary and a few hand written rules to do the tagging .

Tagging is done based on probability of a word belonging to a certain tag( As per the rules above) and the probability of surrounding words in the sentence to get the context.Subsequently, the position of each word in a sentence is determined by a dependency graph, generated in the same procedure. Those POS tags can be further processed to create meaningful single or compound vocabulary terms.

Name Entity Recognition

Named Entity Recognition or NER is the process of detecting the real-world named entities such as person names, location names, company names, etc from a (NER) given piece of text. For example: Sentence — Sergey Brin, the manager of Google Inc. is walking in the streets of New York. Named Entities — ( “person” : “Sergey Brin” ), (“org” : “Google Inc.”), (“location” : “New York”)

Topic Modeling

Is as a method for uncovering hidden structures in sets of texts or documents. In essence it clusters texts to discover latent topics based on their contents, processing individual words and assigning them values based on their distribution. This technique is based on the assumptions that each document consists of a mixture of topics and that each topic consists of a set of words, which means that if we can spot these hidden topics we can unlock the meaning of our texts.

Latent Dirichlet Allocation(LDA)

LDA is one of the most popular algorithms to implement Topic Modeling. LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution. Given a dataset of documents, LDA backtracks and tries to figure out what topics would create those documents in the first place.

Latent Semantic Analysis(LSA)

LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Paragraphs are then compared by taking the cosine of the angle between the two vectors (or the dot product between the normalizations of the two vectors) formed by any two columns. Values close to 1 represent very similar paragraphs while values close to 0 represent very dissimilar paragraphs.

Language Modeling

Language Modeling is the first crucial step for most of the advanced NLP tasks like Text summarization, Machine translation, Chatbots etc. It involves learning to predict the probability of a sequence of words. Now, this is the same technique that Google uses when it gives you search suggestions.

Sequence Modeling

Sequence Modeling is a technique of deep learning that is used to work with sequence data like music lyrics, sentence translation, understanding reviews or building chatbots. This technique is used a lot in NLP because natural language or text is essentially an example of sequence-based data.

Certain preprocessing techniques for texts in NLP

1. Cleaning

Here are some of the most frequent types of noise that is present in text data:

a. HTML tags

When working with documents that have been downloaded/ scraped from the internet, you will encounter situations where you want to extract the text present among the HTML tags.

This can be handled by a lot of ways including but not limited to using Regular Expressions that can ignore all the HTML tags.

b. Unicode and other symbols

This form of noise is common when working with international or scientific documents, many symbols that are not part of the regular English or the ASCII system get mixed in with the text data and it is hard to make sense of these symbols in NLP systems.

These symbols belong to the Unicode system and can be dealt with both using Regular Expressions and the Unicode related functions that are available for strings in python.

c. Removing numbers

Numbers aren’t always good, especially in case of NLP systems numbers do not add much meaning to a given piece of text and so are usually filtered out.

d. Links

Links can be of many forms and most of them consist of strange symbols or short-codes that can be present in your document if you have scraped it from the internet.

2.Tokenization

Is the process of segmenting running text into sentences and words. In essence, it’s the task of cutting a text into pieces called tokens, and at the same time throwing away certain characters, such as punctuation. Following our example, the result of tokenization would be:

It can also be used to remove punctuation marks , blanks etc

3.Normalization:

An important type of textual noise is about the multiple representations exhibited by a single word. For example — “play”, “player”, “played”, “plays” and “playing” are the different variations of the word — “play”, though they mean different things, contextually they all are similar. The step converts all the disparities of a word into their normalized form (also known as lemma) known as Normalization. Normalization is a pivotal step for feature engineering with text as it converts the high dimensional features (N different features) to the low dimensional space (1 feature), which is an ideal task for any ML model.

Stop Words Removal

Includes getting rid of common language articles, pronouns and prepositions such as “and”, “the” or “to” in English. In this process some very common words that appear to provide little or no value to the NLP objective are filtered and excluded from the text to be processed, hence removing widespread and frequent terms that are not informative about the corresponding text.

Note:These can be defined beforehand based on prior knowledge of the case in hand

Stemming:

Refers to the process of slicing the end or the beginning of words with the intention of removing affixes (lexical additions to the root of the word).

For example: To remove ‘ing’ from running , to remove ‘s’ form fruits , to remove ‘ly’ from slowly etc.

Lemmatization:

Has the objective of reducing a word to its base form and grouping together different forms of the same word. For example, verbs in past tense are changed into present (e.g. “went” is changed to “go”) and synonyms are unified (e.g. “best” is changed to “good”), hence standardizing words with similar meaning to their root.

Note:In the next post we will cover word embedding

Please find the link below for part 2:

Please find the link below for part 3

--

--