Workshop 8 on NLP by AIDevNepal

Raisha Shrestha
AIDevNepal
Published in
2 min readMar 22, 2018

In workshop 8 we learned implementation of Natural Language Processing(NLP). We use NLP to encode natural language for computer to understand and carry out processing such as sentiment analysis , and so on . The base understanding of NLP is that sentences is composed of words, which can be considered as tokens. These tokens can be converted into numeric values and considered as features. Now these features can be used for further models or data sets. Each feature is a unique one i.e, features are not repeated.

Before feature calculation, Stemming or Lemmatization is carried out in order to eliminate unnecessary words and retain valid features. Stemming or Lemmatization makes words simpler for computer for example for a sentence “ I am loving Artificial Intelligence”, is converted to “ I am love Artificial Intelligence”. Words are converted to their family or roots. Number of words have same meaning but different tenses or vary on some other way. For example read, reading, study, studying, etc have same meaning . So if any of these words is encountered, then stemming or lemmatization converts these words to their base meaning. Now eliminating stop words such as a, I , all, do , is, in, to and so on can be eliminated. In this way meaningful words are left for the machine to understand.

We carried out these steps of stemming , lemmatization and elimination of stop words practically in workshop 8 . Image below shows implementation of stemming and lemmatization carried out in the workshop.

Figure 1: example of stemming and lemmatization in a text.

We also understood the concept of Naive Bayes classification . Naive Bayes Classification is used to identify whether the word used in the sentence gives positive meaning or negative meaning. We initially have a list of words with positive and negative meanings. Probabilities of positivity and negativity are calculated . And on the basis of present list, probabilities of test data are calculated and ultimately , the data’s positivity or negativity is identified. Accuracy is not calculated for Naive Bayes classification as the sentiment analysis is very critical issue. We can take an example of analyzing terrorists in a room full of people. If there are 2 terrorists in the room and 1 is successfully identified. Prediction of all other people except one single terrorist is right in this case. But still this 1% error is very dangerous. For this reason accuracy for Naive Bayes classification is not calculated.

We learned above mentioned implementation for Natural Language Implementation in Workshop 8 conducted by AIDevNepal . This workshop was very information and practically sound. We are all glad to get to learn so much form this community. Cheers to AIDevNepal. Cheers to growing Artificial Intelligence Enthusiasm among Nepalese Tech Leaders.

#AIDevNepal #Writeup #Experience

--

--

Raisha Shrestha
AIDevNepal

Researcher interested in implementation of image processing , AI techniques in biomedical images.