POSPair : Representation for Natural Language Processing

Jim Mac
POSPair

--

POSPair model is a simplifying representation for Natural Language Processing. POSPair model represents data based on part-of-speech and relations between different part-of-speech. In POSPair model, Word pairs are the unit values generated with reference to the context present in that sentence. Besides just closeness or syntactic relatedness, POSPair model takes into account the actual form of relationship between words, Which words are related and How they are related.

Words are the smallest elements. Based on their use and functions, words are categorized into several types of part-of-speech.

  1. Noun — Used to name persons, things, animals, places, ideas, or events. (Noun)
  2. Pronoun — Functions as a replacement for Noun. (Pronoun)
  3. Adjective — Used to describe Noun or Pronoun. (Noun — Adjective OR Pronoun — Adjective)
  4. Verb — Shows action or state of being. (Noun — Verb OR Pronoun — Verb)
  5. Adverb — Describes Adjective, Verb or another Adverb. (Verb — Adverb, Adverb — Adverb, Adjective — Adverb)
  6. Preposition — Words that specify location or location in time. (Noun — Preposition OR Pronoun — Preposition)
  7. Conjunction — Joins words, phrases or clauses together. (Noun — Conjunction OR Pronoun — Conjunction)
  8. Interjection — Words that express emotion. (Interjection)

As per the definition and semantics of each part-of-speech, words of only specific part-of-speech are related with each other and provide some meaningful relation.

According to part-of-speech, Words are related with each other through above given relations, but in a distinct manner. All relations are one sided relations.
Eg. Adjective describes Noun, Noun does not describe Adjective

The representation of data is done in form of word pairs. At a time, the relations between part-of-speech can be properly represented between two words only. Word pairs are the simplest form of representation.

In POSPair model, Word pairs are generated with the reference to the whole text. Word pairs can be understood when the whole sentence is taken into context.

Currently, Most systems represent data based on closeness, word frequency or syntactic relatedness, whereas POSPair model provides more meaningful relation through correlating right words in the right way. POSPair model aims to provide only possible relations and leave the wrong ones.

POSPair model is available on GitHub. It is written in python. You can start using it via pip. You can also clone it and contribute to it.

Get in touch at pospair.contact@gmail.com

Check out word embeddings with POSPair.

All rights reserved © 2018 Jim Macwan

--

--