What is Natural Language Processing?

Sheel Saket
Artificial Coder
Published in
3 min readApr 9, 2019

As per Wikipedia:

Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

We all know that computers are good with numbers. With the current processing power, computers can perform billions of mathematical calculations at lightening speed and with the emergence of big data and statistical tools, it has become really easy for us to identify trends in a data set and predict the upcoming values or categories of a given case. This is called predictive analytics and is the backbone of Data Science (AI).

But along with huge amount of mathematical data, we are also generating a tremendous amount of text data through social media platforms, chats, call transcripts, online articles etc. So, what to do with these text information? Can we analyse it? Can we predict whether a comment on your facebook profile picture is inappropriate or not? Can we predict whether a mail in your inbox is a spam or not?

Seems like computers are not at all good with analyzing our Natural Language, which in our case is English. Well that’s a bummer!

But what if we can represent a text in the form of numbers? What if we can represent a word in the form of a numerical value and then do the same analysis on the data set?

This is where NLP is the solution to our problem.

Lets take this example:

Text: “Natural Language Processing is a subfield of AI”. Tag: NLP

Text: “Computer Vision is a subfield of AI”. Tag: CV

In the above two examples you have Texts that are Tagged respectively. This is a very simple case of NLP where you get tagged text data set and then using that you have to predict the tag of another text data.

The above two texts can be converted into this:

Now this looks much more familiar to the eyes of an average Data Scientist. In this case we have converted the texts into a data frame of count values and the features are the individual words from the data set.

This transformation from word to numbers, in a nutshell, is called as Word Embedding.

One can now convert a new text data into the above format and predict which tag the text belongs to. We can run similar machine learning models on this transformed data and get predictions.

This is just the intro concept of how NLP works. I will be sharing various other techniques that are generally used in the industry to make our analysis easier and simple.

Today, almost every big company is spending heavily in Natural Language Processing where there could extract meaning insights from customer behavior. According to marketsandmarket.com the NLP market is expected to worth $16 Bn by 2021. This makes its really exciting to become a NLP Data Scientist with a huge expected growth.

Hope you like this article. Please comment your views and let me know if i can improve on anything.

--

--

Sheel Saket
Artificial Coder

Data Scientist. NLP expert. Follow me on twitter @ArtificialCoder