Journey through NLP Research — Basics

deepu kr
4 min readJul 2, 2017

--

Little Bit of History

SHRDLU is the first computer program which accepts natural language as input for moving toy blocks in a virtual world. It accepts commands like “put red pyramid on top of green square” and translate to physical actions inside the virtual world.

What is NLP ?

NLP (Natural Language processing) is the techniques used for Natural Language Understanding (NLU) which plays a major role in Artificial Intelligence (AI). NLP helps the AI systems to process the data for knowledge representation,reasoning and Machine Learning.The objective of NLP is to make the machines as intelligent as human beings in understanding language. NLP fills the gap between human communication(natural langauge) and what the computer understands(machine learning).NLP helps in mapping the given input in natural language to useful representations

Some NLP use cases and tasks

  • Chatbots (Q&A, cognitive conversational agents)
  • Voice activated technologies (Digital Personal Assistants)
  • semantic search (Text classification,data/information extraction like named entity recognition or NER, structure text data to create domain ontology or RDF data structures)
  • Text data summarizer
  • Sentiment analysis
  • Text matching (levenshtein distance,phonetic matching etc)

Statistical Revolution and NLP

The availability of large computational resources and data helped in the statistical revolution in computer applications like speech recognition, probabilistic models, data science, machine learning etc. Different Statistical methods are used in NLP applications like

  • CRF (conditional random fields) for POS Tagging or dependency parsing
  • LDA (latent dirichlet allocation) for topic modelling
  • LSI (latent semantic indexing) for information retrieval
  • HMM (hidden markov model) for NER(named entity recognition

Linguistic Analysis

The following components of natural language are considered for linguistic analysis

  • phonetics and phonology (classification and relationship among speech sounds)
  • morphology (focuses on word formations and relationship between words)
  • Lexicons (inventory of lexemes(pronunciation) which forms the vocabulary of a language)
  • syntax (what is grammatical? or no compiler errors in programmers view)
  • semantics (what does it mean? or no implementation bugs in programmers view)
  • pragmatics (what does it do? or implemented right algorithm in programmers view)

From the above mentioned linguistic properties phonetics and lexicons are mostly used for speech to text conversion systems. The linguistic properties like syntax, semantics and pragmatics can be used for natural language processing and these properties give opportunity for transfer of ideas between Machine Learning(ML) and Natural Language Processing(NLP).

Language Properties

The following properties of natural language should be considered for developing NLP systems which uses lexical semantics,

  • Meaning unit beyond a word (multi word expressions- eg:”light bulb”)
  • Meaning unit within a word (eg: light,lighten,lightening)
  • One word has multiple meanings (polysemy- eg: The lamp lights up the room”,” The load is not light”). Word sense vectors can be a good solution for machine learning approach.
  • Multiple word with same meaning (synonymy- eg:“confusing,unclear,perplexing”). The semantic distance metrics like cosine distance,LSA,LDA etc can be considered for a ML approach.
  • Textual entailment- recognizing when one sentence is logically entailed in another. Eg: “you are reading the article” entails the sentence “you can read”. Hyponomy (is-a relation) and meronomy (has-a relation) can be used to perform the task of textual entailment. Hyponomy- eg:”cat is a mammal”. Meronomy- eg:”cat has a tail”.

Apart from the above lexical semantic properties, the natural language also involve compositional semantics like model theory and compositionality. The following properties should be considered for developing NLP systems which uses compositional semantics.

  • beliefs (eg:”lois believes superman is a hero” is not same as“lois believes clark kent is a hero”)
  • conversational implicatures (eg:”what on earth has happened to beef roast” can be logically connected to “the dog is looking very happy”. The implicature in this case is “The dog ate the beef roast”)
  • presupposition which is the background assumption independent of truth of sentence.

The underlying principle of language is a cooperative game between speaker and listener. Implicatures and presuppositions depend on people and context which involves a case of soft inference and Machine Learning.

Aspects of Syntactic Analysis in NLP

The following aspects are used for syntax or grammar based approaches in NLP

  • Context Free Grammar (CFG)
  • Top-Down Parser
  • Transition network parser
  • Chart Parser
  • Neural Network based parsers (google syntaxnet)

The above mentioned parsing algorithms can generate parse trees from sentences. These parse trees will have NP(noun phrases), VP(verb phrases), Nouns, Verbs, Adjectives and lot of other grammatical units. The syntax based NLP systems can be used for POS (Part-of-Speech) tagging, Dependency Tagging etc.

Aspects of Semantic Analysis in NLP

  • Distributional
  • Frame based
  • Model Theoretical
  • Interactive Learning

Distributional

  • Uses statistical tactics of ML & deep learning
  • This method typically turn content to word vectors for mathematical analysis
  • Performs well on POS tagging and dependency parsing
  • Doesn't understand the meaning of words, rather rely on relationship between words.
  • Lacks true understanding of real world semantics and pragmatics
  • Comparing words to words, or words to sentences, or sentences to sentences result in different outcomes.
  • This method can achieve breadth but cannot handle depth
  • Eg: LSA, Dimensionality reduction using SVD(singular Value Decomposition), Tf-Idf, CBOW etc

Frame Based

  • A frame is a data structure for representing a stereotyped situation.
  • Consider a stereotyped situation like commercial transaction which will have buyer, seller, goods and price.So parsing such sentences will give identified frame and parameters like buyer,seller, goods and price

Model Theoretical Approach

  • This approach depends on Model Theory and compositionality
  • Model theory refers that the sentences defines the world like grounded language.
  • Compositionality refers, meanings of parts of sentences can be combined to deduce the whole meaning.

Interactive Learning

  • SHRDLU
  • According to percy liang, “Language is intrinsically interactive,How do we represent knowledge, context, memory?Maybe we shouldn’t be focused on creating better models, but rather better environments for interactive learning.”

Natural Language Ambiguity

NLP tries to resolve the following ambiguity in natural language,

  • Lexical ambiguity (words have multiple meanings)
  • Syntactic ambiguity (sentence have multiple parse trees)
  • Semantic ambiguity (sentence have multiple meanings)
  • Anaphoric ambiguity (phrase/word which is previously mentioned but has a different meaning)

In this article i discussed what is NLP, Linguistic analysis used in NLP like syntax and semantics,different syntactic and semantic aspects of NLP.

--

--

deepu kr

Research & Development | NLP | Machine Learning | AI | Chatbots | Devops | Cloud Computing Architect | Solutions Architect | Hacker