An Introduction to NLP — explanation and examples.

The skin

Tiago Duque
Analytics Vidhya
8 min readJan 10, 2020

--

So you’re into Natural Language Processing — NLP (not to mistake with Neuro Linguistic Programming, whatever that is, it keeps appearing in my search results…).

Its 1950, the Mind Magazine. One prominent scientist (mathematician) with name Alan Turing, after a long discussion on theoretical ways to make a machine learn, wrote the following words:

it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.[1]

And there was born NLP. Okay, not so abruptly, but the idea was there. With the advent of computing and Computer Intelligence, the idea of NLP was already around.

Following that premise, Natural Language Processing could be summarized as the ability to process human (usually generalized as natural) language, being it either written, spoken, pictorial etc.

To process (from latin processus — progression, course) is to change something into another thing. In this case, take human language and create computer representations of it.

For example, these words you’re reading are written in a Natural Language (english with latin letters) and stored in a computer language (binary, represented as series of 0’s and 1‘s).

However, NLP is not just translating alphabet to bits. It is more related to making computers able to automatically act/react (do some action or generate something in language terms) based on how human languages are represented and organized.

NLP deals with things like:

  • Which of the words in the phrase is a verb?
  • Which noun in the phrase plays the role of the phrase root?
  • To which other word this “modded” (flexed) word is related?
  • What is the sentiment carried by this phrase? (brushing the borthers of NLU)
  • What does this word mean in this context? (trespassing into NLU area)

And many other tasks…

It may be hard to explain what is NLP because most of us do it naturally we don’t stop to think which is the next word, what is certain word role in a phrase and etc. We just know the pattern and fill it with the desired variations.

Even harder it is to grasp the idea of idea (urr!?, concepts?) representation. How to represent ideas? With words? Why? Do we think in terms of words? Well, that’s a good discussion if you are a philosopher / neuroscientist / linguist, but not for a NLP overview series (Wanna read about a neat idea in this area ? Head over to [2] and check where are the discussions related to consciousness and language).

Now, a good introduction to NLP (one of the best that I’ve read) is the one presented in [3] by Jurafsky and Martin. These authors propose 6 generations in NLP History, which I’ll try to summarize (be aware that this historical part was removed in its third edition draft of the book — I linked the second version in the References section, where you can find the summary from pages 9–13).

  1. Foundation 1940’s and 1950's: This is where NLP was born, impelled by Turing’s Ideas, McCuloch’s and Pitts’ Artificial Neuron Theories and Chmosky’s Formal Language Theory — it was the start of new possibilities, an era where something else than humans could process text automatically. However, most works were theory or expert systems. Some probabilistic algorithms came out here.
  2. The Two Camps 1957–1970: In this era, a division arose in NLP: while symbolic/model based approaches (rules to replicate how we do language) were the norm, with the first parsing algorithms and development of the first online Corpus, the stochastic/statistical paradigm (this means that some language structures were modeled using statistical approaches) took hold with the application of some Bayesian methods.
  3. Four Paradigms 1970–1983: This period started with high hopes and prospects, with groups defending four distinct paradigms: stochastic (statistics), logic-based, Natural Language Understanding (NLU and its representations) and discourse modeling. While this seems a fertile ground, it is also the period of the so-called “AI Winter”, where the many unproductive AI attempts caused the investment on the area to drop severely. Nowadays, many of these ideas are being resurrected to be applied with the aid of more powerful computers and algorithms (such as the network-based semantics proposed by the NLU paradigm).
  4. Empiricism and Finite-State Models 1983–1993: After a cold era, this new era started to offer warmer, clearer results. This was when probability took of as the favorite tool for NLP (among computer scientists, of course). With better computers and data, the first data-driven Part of Speech taggers were made. Empiricism (practical-reproducible approaches) became the norm.
  5. The merge of Probability and Model-Based 1994–1999: In this era, classical model-based algorithms for areas such as reference resolution, information retrieval and discourse processing started to merge with the empirical approaches — this mean less hand crafting in favor of more data inferece.
  6. The Rise of Machine Learning 2000–2008 (this was when the second edition of the book was edited): Lots of data, powerful computers, proven Machine Learning Algorithms and Internet! All this resulted in the (early) golden era of NLP. Treebanks allowed better Lemmatization and POS Tagging. Machine Translation became accessible to the masses and many language classification tasks were evolved with Machine Learning.
  7. Now 2009- (This era is not in Jurafsky and Martin book): We now live in an extension of the 6th era. We could point to an era where DeepLearning took NLP by assault, ChatterBots employ many NLP tasks in a casual matter and where there are several available libraries to many languages to do complex NLP activities. Not to mention the development of RNN’s, BERT and Transformer Models, which successively brought NLP to its pinnacle. However, even with such outbreaks, NLP is not solved (not even closely), especially when we talk about NLU specializations (such as dealing with ambiguity and really automating thought).

Now that we know a little history and some basic meaning, let us see some examples of NLP applications.

Text Classification:

Probably one of the simpler to explain application. It can be specialized into many other well known activities, such as Sentiment Analysis (which is nothing more than classifying a text into one of the tones between good and bad).

In text classification, words (and, more richly, their relations, position and contextual meaning) are used as features for an algorithm that defines whether the text belongs to class x or y or z. Since classification is one Machine Learning Task, this is usually the case (but you can define a model or manual set of rules as well).

Google Employs Text Classification Algorithms to classify incoming mails into spam or inbox.

(Automated) Question Answering:

A little more complex than Text Classification is Question Answering. Not only the question text has to be considered, but also the text of the many possible target documents.

There are many ways to do Question Answering: using Deep Learning with models like Seq2Seq (on late 2019 Kaggle even launched a Competition on it); using Knowledge Graphs (Google uses it for its assistant quick answers) and many other techniques. If you want to read more about it, check my [4] Master’s Thesis, there’s a good chapter on question answering there.

Google Knowledge Graph presentation. This is used for searches which are a deal of Information Retrieval, but also related to Question Answering.

Chatter Bots:

These are in fact a combination of the previous techniques, but added a layer of end-user application. Many new techniques are being employed to join the areas of NLP, allowing the bots to understand user intent, sentiment and even irony (Sheldon Cooper be careful)!

If you want to mix history with examples, talk to Dr. Eliza, a robot Rogerian psychotherapist considered the first example of chatter bot created.

Machine Translation:

You guessed it! It allows to translate one language to another. This one is easily understandable because we use it very commonly (at least us, who are not native english speakers or who care to display not too clumsy messages in other languages).

For some, it looks to be solved. But if you fiddle a little, you’ll find out that there are many details in Machine Translation that are not yet perfect (for example, try getting a sarcastic context specific phrase translated to any major language — now try between languages other than English).

Google Translator joins imperial X metric battle!

Natural Language Generation (NLG):

This is the “opposite” of Natural Language Processing. Instead of consuming textual data to extract inferences, the machine generates text from previous inferences and stimuli. It could be argued that Machine Translation is somehow NLG. But that’s in the conceptual level.

We already have a lot of newsbots and similar things around, but recently, the Transformer architecture achieved unprecedented results in the area. Look how funny it is:

Nonsense text written with the aid of Transformers. It is nonsense because I made it so, but it works! Bold letters are machine generated text, the others are mine =)

Text Summarization:

Imagine if you could take all this text that I wrote and simply read the “most important parts”. That’s the idea behind Text Summarization. It aims at checking each part of the text deciding whether it is important or not. It can also be “reduced” to Topic Modeling, which is attempting to retrieve the main topics (and not just a summary) of a text.

Courtesy of: http://textsummarization.net/text-summarizer

Now, I know you love to see applications. Here’s a Summary of the Text so Far:

And there was born NLP.

Following that premise, Natural Language Processing could be summarized as the ability to process human (usually generalized as natural) language, being it either written, spoken, pictorial etc.

So you’re into Natural Language Processing — NLP (not to mistake with Neuro Linguistic Programming, whatever that is, it keeps appearing in my search results…).

To process (from latin processus — progression, course) is to change something into another thing.

Google Employs Text Classification Algorithms to classify incoming mails into spam or inbox.

Good, isn’t it? Okay, not so much (after all this work I had to write this article), but you get the gist.

Conclusion:

NLP is broad and powerful. Data Scientists need it, Machine Learning Engineers need it, YOU need it. Now, it is time to get our hands dirty with some pract… Oooookay. Not so fast. First, let us learn some principles of preprocessing.

Bibliography and References:

[1] Turing, A. M. (1950). I. — Computing Machinery And Intelligence. Mind, LIX(236), 433–460. doi: 10.1093/mind/lix.236.433

[2] Haladjian , H. H. (2016). Consciousness and Language. Retrieved January 9, 2020, from https://www.psychologytoday.com/us/blog/theory-consciousness/201608/consciousness-and-language.

[3] Jurafsky, D., & Martin, J. H. (2014). Speech and language processing. Upper Saddle River, NJ: Prentice Hall, Pearson Education International.

[4] Duque, T. F. (2019). Graph based approach for question answering — improving efficiency in natural language processing for small corpora. Juiz de Fora. Retrieved from https://repositorio.ufjf.br/jspui/bitstream/ufjf/10735/1/tiagofaceroliduque.pdf

--

--

Tiago Duque
Analytics Vidhya

A Data Scientist passionate about data and text. Trying to understand and clearly explain all important nuances of Natural Language Processing.