The past couple of months I have been working on a Question Answering System and in my upcoming blog posts, I would like to share some things I learned in the whole process. I haven’t reached a satisfactory accuracy with the answers fetched by the system, but it is work in progress. Adam QAS on Github.
In this post, we are specifically going to focus on the Question Classification part. The goal is to classify a given input question into predefined categories. This classification will help us in Query Construction / Modelling phases.
Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely used syntactic structure is the parse tree which can be generated using some parsing algorithms. These parse trees are useful in various applications like grammar checking or more importantly it plays a critical role in the semantic analysis stage. For example to answer the question “Who is the point guard for the LA Laker in the next game ?” …
A word in any language is made of a root or stem word and an affix. These affixes are usually governed by some rules called orthographic rules. These orthographic rules define the spelling rules for a word composition in Morphological Parsingphase. A lexicon is a list of such stem words and affixes and is a vital requirement to construct a Morphological Parser. Morphological parsing involves building up or breaking down a structured representation of component morphemes to form a meaningful word or a stem word. …
In this blog post, I will be discussing all the tools of Natural Language Processing pertaining to the Linux environment, although most of them would also apply to Windows and Mac. So, let’s get started with some prerequisites.
We will use Python’s Pip package installer in order to install various python modules.
$ sudo apt install python-pip
$ pip install -U pip
$ pip install --upgrade pip
So I am going to talk about three NLP tools in Python that I have worked with so far.
Naive Bayes Classifier is probably the most widely used text classifier, it’s a supervised learning algorithm. It can be used to classify blog posts or news articles into different categories like sports, entertainment and so forth. It can be used to detect spam emails. But most important is that it’s widely implemented in Sentiment analysis. So first of all what is supervised learning? It means that the labeled training dataset is provided along with the input and the respective output. From this training dataset, our algorithm infers the next outcome to a given input.
Building tools for creators @shoutnow_me