Journey Through NLP Research — Tools and Techniques

deepu kr
7 min readJul 28, 2017

--

In the previous article, i have shared the basics of NLP like,

  • Linguistic analysis(syntax,semantics and pragmatics)
  • lexical and compositional semantic properties of language
  • syntactic and semantic analysis in NLP
  • Language Ambiguity

In this article i will share the different tools and techniques used for syntactic and semantic analysis in NLP.

Syntactic Analysis Tools

The Syntactic analysis focuses on the syntax structure of language like grammar(noun-phrases,verb-phrase,nouns,verbs etc) and dependencies between words(syntactic or dependency parsing). It is a task of recognizing a sentence and assigning a syntactic structure to it. This syntactic structure can be used for grammar checking and give some insights to semantic analysis of the language. All syntactic analysis system has following two main components. The syntactic analysis also helps to extract features from text which can be used for different ML algorithms to create advanced NLP systems.

  • A declarative representation(grammar) of the syntactic facts about the language
  • A procedure (parser) that compares the grammar against input sentences to produce parsed structures.
  • The POS (part-of-speech) tagging is a module in syntactic parsing system which tags the words in sentences for nouns,noun phrases,verbs,verb phrases etc
syntactic analysis

The complexity in POS tagging leads to the generation of different parse trees which makes the NLP systems ambiguous.

The following tools can be used for dependency parsing and POS tagging. The tools are ordered in terms of accuracy and performance

Semantic Analysis Tools

The semantic analysis is the process of analyzing the meaning or construct meaning representations of linguistic input.In a Machine Learning(ML) perspective, it is the process of producing common-sense knowledge about the world(extract data and construct models of the world).The current trend and focus of research in NLP is shifting from symbolic representational models to semantic representational models.As discussed in the article, the semantic analysis can be done in following ways,

  • Lexical Semantics (meaning of component words, word sense disambiguation)
  • Compositional Semantics (how words combine to form large meanings)

Approaches to Semantic Analysis

1 . Predicate Logic

The sentence “A restaurant that serves Chinese food near TUT” corresponds to meaning representation “restaurant(x) | Serves(x, ChineseFood) | Near(LocationOf(x),LocationOf(TUT)”. The predicate logic is about creating meaning representations from linguistics. These logical prepositions enable inference. The predicate logic approach doesn’t support large vocabulary or unrestricted domain and has scalability problems.

2. Statistical Approach

This approach uses statistical machine translation.In this approach we need to find a method to model or represent the language(words and phrases) to another form. The following are the process involved in statistical approach

  • Preprocessing of text data like tokenization,stemming and lemmatization.
  • Feature extraction or converting text or linguistic data to feature vectors(eg: tokenization, TF-IDF vectorization,word embeddings etc). This process usually involves the conversion of text data to numbers and the statistical models will be doing the number crunching operations.
  • Convert the feature vectors to models or another statistical representations
  • The above model or statistical representations can be used to represent or project the input data to that representation space or vector space for further mathematical processing. (eg model.transform operations in major machine learning library like Scikit Learn).
  • The final method in this approach will be to classify,cluster or predict the data based on the model.

3. Information Retrieval

Information Retrieval approach can be used for semantic analysis by using statistical modelling for information retrieval(like LSI-Latent Semantic Indexing, LDA-Latent Dirichlet Allocation) in the initial step and uses page ranking algorithms to improve the semantic analysis in the model. Eg . Google uses statistical methods for information retrieval from a huge database and uses page ranking to improve the semantic prediction model.

4. Domain knowledge driven analysis

This approach expects certain “slots” of information to be filled in. (eg: booking a flight). The approach of restriction to a certain domain allows the use of specific patterns,rules and expectations. (eg: customer at restaurant,buying train tickets).This approach also utilizes pragmatics like socially probable set of “moves” in a certain context.

Representations Used in NLP

The semantic analysis uses different representations which can be used for text data analysis in semantic and concept space, information retrieval etc. The following are the most common representations used in NLP.

  1. Statistical modelling based representations
  2. Graph or Network based representations
  3. Neural Network based representations

Statistical modelling based representations

The statistical method based modelling is the most commonly used representation for storing the language models for semantic analysis. In this method the stored model will be a n-dimensional array or a high dimensional space model which can be used to transform or project the input data/text for semantic analysis or comparison by calculating the distance(like cosine distance) between the vector data points. The following are the general operations (training and prediction) involved in the model generation. Steps 1 & 2 are the training steps and step 3,4 & 5 are the prediction steps.

  1. Transforming the training data to vectors / vectorizing / Feature Extraction (eg: TFIDF vectorizer, word count vectorizer)
  2. Optimize the vectorized data to another representation / modelling / fit the vector data to ML model(eg: SVD, multinomial ML algorithms for classification, SVM etc)
  3. Convert Input texts/data for processing to vectors / Extract features from input text/data.
  4. Transform / project the vectors/features from input texts/data to created high dimensional space
  5. Predict the classified data or find the similarity distance(eg cosine distance) between the vectors.

The following are the different techniques/algorithms used in statistical modelling.

  1. word-content matrix of counts(data) and generalize (dimensionality reduction) using SVD (singular value decomposition)
  2. LSA (Latent Semantic Analysis and dimensionality reduction using SVD)
  3. One-Hot vector Model
  4. Continuous Bag-Of-Words model (CBOW) and generalize using SGD (stochastic gradient descent) optimization
  5. Skip-Gram Model and generalize using SGD (stochastic gradient descent) optimization.
  6. Multinomial models like HMM (hidden markov model) and LDA (Latent Dirichlet Allocation) for text or document classification.

References

  1. word embeddings

Graph or Network based representations

The graph or Network based representations can be used in scenarios when imagining words in a lexical resource or concepts in a knowledge network, or even words within a sentence that are connected to each other through what is formalized as syntactic relations.Since the early ages of artificial intelligence, semantic networks have been proposed for storage of language units and the relations that interconnect them. These semantic networks allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. Graphs are a powerful representation formalism. In language, this is probably most apparent in graph-based representations of words’ meanings through their relations with other words (Quillian 1968), which has resulted in WordNet (Fellbaum 1998) — a semantic network that after more than 20 years is still heavily used for a variety of tasks like

  • word sense disambiguation
  • semantic similarity
  • question answering
  • syntactic parsing
  • prepositional attachment
  • co-reference resolution

References

  1. Survey of graphs in NLP
  2. Networks and NLP

Implementations

  1. wordnet
  2. framenet
  3. conceptnet
  4. openmind-net
  5. linkeddata

Neural Network based Representations

Most NLP techniques were dominated by machine learning approaches that used linear models like support vector machines or logistic regression, trained over very high dimensional and sparse feature vectors. Recently the field has seen some success in switching from linear models over sparse inputs to non-linear neural network model over dense inputs.The two basic neural network architectures are

  • feed-forward networks
  • recurrent/recursive networks

Feed-forward networks

  1. multi-layer perceptron (networks with fully connected layers)
  2. networks with convolutional and pooling layers.

Both of the above mentioned networks act as classifiers, but each with different strengths.

Fully connected feed-forward neural networks or multi-layer perceptron are non-linear learners that can, for the most part, be used as a drop-in replacement wherever a linear learner is used like binary and multiclass classification problems and more complex structured prediction problems. The non-linearity of the network, as well as the ability to easily integrate pre-trained word embeddings, often lead to superior classification accuracy.Applications of a feed-forward network are

  • classifier replacement for CCG (combinatory categorial grammer)parser used for CCG supertagging
  • dialog state tracking
  • pre-ordering for statistical machine translation
  • language modeling
  • multilayer feed-forward networks can provide competitive results on sentiment classification and factoid question answering.

Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership, but these clues can appear in different places in the input.For example, in a document classification task, a single key phrase (or an ngram) can help in determining the topic of the document.We would like to learn that certain sequences of words are good indicators of the topic, and do not necessarily care where they appear in the document. Convolutional and pooling layers allow the model to learn to find such local indicators, regardless of their position. Convolutional and pooling architecture show promising results on many tasks,

  • Document classification
  • short-text categorization
  • sentiment classification
  • relation-type classification between entities

Recurrent or Recursive Networks

In natural language we often work with structured data of arbitrary sizes, such as sequences and trees. We would like to be able to capture regularities in such structures, or to model similarities between such structures. In many cases, this means encoding the structure as a fixed width vector, which we can then pass on to another statistical learner for further processing.While convolutional and pooling architectures allow us to encode arbitrary large items as fixed size vectors capturing their most salient features, they do so by sacrificing most of the structural information. Recurrent and recursive architectures allow us to work with sequences and trees while preserving lot of structural information.Recurrent networks are designed to model sequences while recursive networks are generalizations of recurrent networks that can handle trees.

Recurrent models have been shown to produce very strong results in,

  • Machine translation
  • Dependency parsing
  • Sentiment analysis
  • Noisy text normalization
  • Response Generation

The following feature representations can be used in neural network based modelling,

  • One-Hot vector representation
  • Continuous bag-of-words model (CBOW)
  • Skip-Gram model

A main component of neural-network approach is the use of embeddings called word embedding.(representing each feature as a vector in a low dimensional space).word2vec by google is a good software package for converting text to word embedding.

Word embedding Resources

--

--

deepu kr

Research & Development | NLP | Machine Learning | AI | Chatbots | Devops | Cloud Computing Architect | Solutions Architect | Hacker