Natural Language Processing

“All that we are is the result of what we have thought.

The mind is everything. What we think we become.”

~ The Buddha

NLP is a branch that consists of analyzing, understanding, and deriving information from the text data in a smart and efficient manner.

NLP is used to analyze text, allowing machines to understand how human’s speak. This human-computer interaction enables real-world applications like automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question answering.

Some basic tasks of NLP are:

  1. Tokenization — process of converting a text into tokens
  2. Tokens — words or entities present in the text
  3. Text object — a sentence or a phrase or a word or an article

By utilizing NLP and its components, one can organize the massive chunks of text data, perform numerous automated tasks and solve a wide range of problems such as — automatic summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation etc.

How to represent the sentence for processing?

There are multiple ways to build a sentence vector representation:

1. Bag-of-words: Where we have each word as a dimension (and hence sentence is a vector with dimension |V|, where V is the vocabulary). Each word dimension is given a value equal to number of word occurrences in the sentence.

2. TFIDF based: Where in bag of words , tf–idf is used instead of number of word occurrence in sentence.

3. Word embedding: For representing words, a sentence vector is made by neural networks (recursively combining word embeddings with a generative model like recursive/recurrent Neural Net or using some other non-neural network algorithm like doc2vec for this purpose). Here the sentence vector generally has a similar shape when compared to word embeddings.

The following models a text document using bag-of-words.

Here are two simple text documents:

(1) John likes to watch movies. Mary likes movies too.

(2) John also likes to watch football games.

Based on these two text documents, a list is constructed as follows:

["John","likes","to","watch","movies","Mary","too","also","football","games"]

In practice, the Bag-of-words model is mainly used as a tool of feature generation. After transforming the text into a “bag of words”, we can calculate various measures to characterize the text. The most common type of characteristics, or features calculated from the Bag-of-words model is term frequency, namely, the number of times a term appears in the text. For the example above, we can construct the following two lists to record the term frequencies of all the distinct words:

(1) [1, 2, 1, 1, 2, 1, 1, 0, 0, 0](2) [1, 1, 1, 1, 0, 0, 0, 1, 1, 1]

Conceptually, we can view bag-of-word model as a special case of the n-gram model, with n=1.

Connect with ArIES to know more.

See the following links for implementation of various NLP tasks:

https://www.kaggle.com/c/word2vec-nlp-tutorial

https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-understand-implement-natural-language-processing-codes-in-python/

Important Libraries for NLP (python)

  • Scikit-learn: Machine learning in Python
  • Natural Language Toolkit (NLTK): The complete toolkit for all NLP techniques.
  • spaCy — Industrial strength N LP with Python and Cython.
  • Gensim — Topic Modelling for Humans
  • Stanford Core NLP — NLP services and packages by Stanford NLP Group.

)

Artificial Intelligence And Electronics Society

Written by

Artificial Intelligence and Electronics Society (ArIES) is a campus group of IIT Roorkee, with a mission to solve impactful problems of society .

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade