NLP: Text Processing Via Stemming And Lemmatisation In Data Science Projects

How We Can Normalise And Reduce The Number Of Common Words Into A Single Word For Text Analytics

Farhad Malik
Jun 26 · 3 min read

Data science projects can take advantage of these techniques in the text analytics projects.

Photo by Denny Müller on Unsplash

What Is Stemming?

Some words can be reduced to a single word. Stemming revolves around the concept of removing the last characters of a word until we can get a common word to represent a number of words. Often this process ends up reducing many words into a single word. The final word is known as lemma.

Stemming is about stripping suffixes

As an example, blogging, blogged and blogs can be reduced to the single word “blog”.

from nltk import SnowballStemmer# Function to apply stemming to a list of words
stemmer = SnowballStemmer()
for word in ['blogging','blogged','blogs']:
print(stemmer.stem(word))
#This will return blog, blog,

What Is Lemmatisation?

Lemmatisation is the process of reducing the number of words into a single word by combining common words together. It is the process of transforming to the dictionary base form.

import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("blogs"))
#Returns blog

Summary

Reducing the number of common words into a single word is a useful text analytics technique. This article explained the two key techniques known as Stemming and Lemmatisation. These techniques are widely used in the SEO, tagging, search engines and indexing systems.

FinTechExplained

This blog aims to bridge the gap between technologists, mathematicians and financial experts and helps them understand how fundamental concepts work within each field. Articles

Farhad Malik

Written by

Explaining complex mathematical, financial and technological concepts in simple terms. Contact: FarhadMalik84@googlemail.com

FinTechExplained

This blog aims to bridge the gap between technologists, mathematicians and financial experts and helps them understand how fundamental concepts work within each field. Articles