Journey Through NLP Research — Tools and Techniques

7 min readJul 28, 2017

In the previous article, i have shared the basics of NLP like,

Linguistic analysis(syntax,semantics and pragmatics)
lexical and compositional semantic properties of language
syntactic and semantic analysis in NLP
Language Ambiguity

In this article i will share the different tools and techniques used for syntactic and semantic analysis in NLP.

Syntactic Analysis Tools

The Syntactic analysis focuses on the syntax structure of language like grammar(noun-phrases,verb-phrase,nouns,verbs etc) and dependencies between words(syntactic or dependency parsing). It is a task of recognizing a sentence and assigning a syntactic structure to it. This syntactic structure can be used for grammar checking and give some insights to semantic analysis of the language. All syntactic analysis system has following two main components. The syntactic analysis also helps to extract features from text which can be used for different ML algorithms to create advanced NLP systems.

A declarative representation(grammar) of the syntactic facts about the language
A procedure (parser) that compares the grammar against input sentences to produce parsed structures.
The POS (part-of-speech) tagging is a module in syntactic parsing system which tags the words in sentences for nouns,noun phrases,verbs,verb phrases etc

The complexity in POS tagging leads to the generation of different parse trees which makes the NLP systems ambiguous.

The following tools can be used for dependency parsing and POS tagging. The tools are ordered in terms of accuracy and performance

Semantic Analysis Tools

The semantic analysis is the process of analyzing the meaning or construct meaning representations of linguistic input.In a Machine Learning(ML) perspective, it is the process of producing common-sense knowledge about the world(extract data and construct models of the world).The current trend and focus of research in NLP is shifting from symbolic representational models to semantic representational models.As discussed in the article, the semantic analysis can be done in following ways,

Lexical Semantics (meaning of component words, word sense disambiguation)
Compositional Semantics (how words combine to form large meanings)

Approaches to Semantic Analysis

1 . Predicate Logic

The sentence “A restaurant that serves Chinese food near TUT” corresponds to meaning representation “restaurant(x) | Serves(x, ChineseFood) | Near(LocationOf(x),LocationOf(TUT)”. The predicate logic is about creating meaning representations from linguistics. These logical prepositions enable inference. The predicate logic approach doesn’t support large vocabulary or unrestricted domain and has scalability problems.

2. Statistical Approach

This approach uses statistical machine translation.In this approach we need to find a method to model or represent the language(words and phrases) to another form. The following are the process involved in statistical approach

Preprocessing of text data like tokenization,stemming and lemmatization.
Feature extraction or converting text or linguistic data to feature vectors(eg: tokenization, TF-IDF vectorization,word embeddings etc). This process usually involves the conversion of text data to numbers and the statistical models will be doing the number crunching operations.
Convert the feature vectors to models or another statistical representations
The above model or statistical representations can be used to represent or project the input data to that representation space or vector space for further mathematical processing. (eg model.transform operations in major machine learning library like Scikit Learn).
The final method in this approach will be to classify,cluster or predict the data based on the model.

3. Information Retrieval

Information Retrieval approach can be used for semantic analysis by using statistical modelling for information retrieval(like LSI-Latent Semantic Indexing, LDA-Latent Dirichlet Allocation) in the initial step and uses page ranking algorithms to improve the semantic analysis in the model. Eg . Google uses statistical methods for information retrieval from a huge database and uses page ranking to improve the semantic prediction model.

4. Domain knowledge driven analysis

This approach expects certain “slots” of information to be filled in. (eg: booking a flight). The approach of restriction to a certain domain allows the use of specific patterns,rules and expectations. (eg: customer at restaurant,buying train tickets).This approach also utilizes pragmatics like socially probable set of “moves” in a certain context.

Representations Used in NLP

The semantic analysis uses different representations which can be used for text data analysis in semantic and concept space, information retrieval etc. The following are the most common representations used in NLP.

Statistical modelling based representations
Graph or Network based representations
Neural Network based representations

Statistical modelling based representations

The statistical method based modelling is the most commonly used representation for storing the language models for semantic analysis. In this method the stored model will be a n-dimensional array or a high dimensional space model which can be used to transform or project the input data/text for semantic analysis or comparison by calculating the distance(like cosine distance) between the vector data points. The following are the general operations (training and prediction) involved in the model generation. Steps 1 & 2 are the training steps and step 3,4 & 5 are the prediction steps.

Transforming the training data to vectors / vectorizing / Feature Extraction (eg: TFIDF vectorizer, word count vectorizer)
Optimize the vectorized data to another representation / modelling / fit the vector data to ML model(eg: SVD, multinomial ML algorithms for classification, SVM etc)
Convert Input texts/data for processing to vectors / Extract features from input text/data.
Transform / project the vectors/features from input texts/data to created high dimensional space
Predict the classified data or find the similarity distance(eg cosine distance) between the vectors.

The following are the different techniques/algorithms used in statistical modelling.

word-content matrix of counts(data) and generalize (dimensionality reduction) using SVD (singular value decomposition)
LSA (Latent Semantic Analysis and dimensionality reduction using SVD)
One-Hot vector Model
Continuous Bag-Of-Words model (CBOW) and generalize using SGD (stochastic gradient descent) optimization
Skip-Gram Model and generalize using SGD (stochastic gradient descent) optimization.
Multinomial models like HMM (hidden markov model) and LDA (Latent Dirichlet Allocation) for text or document classification.

References

word embeddings

Graph or Network based representations

The graph or Network based representations can be used in scenarios when imagining words in a lexical resource or concepts in a knowledge network, or even words within a sentence that are connected to each other through what is formalized as syntactic relations.Since the early ages of artificial intelligence, semantic networks have been proposed for storage of language units and the relations that interconnect them. These semantic networks allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. Graphs are a powerful representation formalism. In language, this is probably most apparent in graph-based representations of words’ meanings through their relations with other words (Quillian 1968), which has resulted in WordNet (Fellbaum 1998) — a semantic network that after more than 20 years is still heavily used for a variety of tasks like

word sense disambiguation
semantic similarity
question answering
syntactic parsing
prepositional attachment
co-reference resolution

References

Implementations

Neural Network based Representations

Most NLP techniques were dominated by machine learning approaches that used linear models like support vector machines or logistic regression, trained over very high dimensional and sparse feature vectors. Recently the field has seen some success in switching from linear models over sparse inputs to non-linear neural network model over dense inputs.The two basic neural network architectures are

feed-forward networks
recurrent/recursive networks

Feed-forward networks

multi-layer perceptron (networks with fully connected layers)
networks with convolutional and pooling layers.

Both of the above mentioned networks act as classifiers, but each with different strengths.

Fully connected feed-forward neural networks or multi-layer perceptron are non-linear learners that can, for the most part, be used as a drop-in replacement wherever a linear learner is used like binary and multiclass classification problems and more complex structured prediction problems. The non-linearity of the network, as well as the ability to easily integrate pre-trained word embeddings, often lead to superior classification accuracy.Applications of a feed-forward network are

classifier replacement for CCG (combinatory categorial grammer)parser used for CCG supertagging
dialog state tracking
pre-ordering for statistical machine translation
language modeling
multilayer feed-forward networks can provide competitive results on sentiment classification and factoid question answering.

Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership, but these clues can appear in different places in the input.For example, in a document classification task, a single key phrase (or an ngram) can help in determining the topic of the document.We would like to learn that certain sequences of words are good indicators of the topic, and do not necessarily care where they appear in the document. Convolutional and pooling layers allow the model to learn to find such local indicators, regardless of their position. Convolutional and pooling architecture show promising results on many tasks,

Document classification
short-text categorization
sentiment classification
relation-type classification between entities

Recurrent or Recursive Networks

In natural language we often work with structured data of arbitrary sizes, such as sequences and trees. We would like to be able to capture regularities in such structures, or to model similarities between such structures. In many cases, this means encoding the structure as a fixed width vector, which we can then pass on to another statistical learner for further processing.While convolutional and pooling architectures allow us to encode arbitrary large items as fixed size vectors capturing their most salient features, they do so by sacrificing most of the structural information. Recurrent and recursive architectures allow us to work with sequences and trees while preserving lot of structural information.Recurrent networks are designed to model sequences while recursive networks are generalizations of recurrent networks that can handle trees.

Recurrent models have been shown to produce very strong results in,

Machine translation
Dependency parsing
Sentiment analysis
Noisy text normalization
Response Generation

The following feature representations can be used in neural network based modelling,

One-Hot vector representation
Continuous bag-of-words model (CBOW)
Skip-Gram model

A main component of neural-network approach is the use of embeddings called word embedding.(representing each feature as a vector in a low dimensional space).word2vec by google is a good software package for converting text to word embedding.

Word embedding Resources