Text Processing made easy with textslack

Ankit Raj
Analytics Vidhya
Published in
4 min readOct 9, 2020

Whenever we think about natural language processing (NLP) tasks, what is the first thing that comes into our mind? Yes, you guess it right, Text Processing. And what exactly we mean when we use the term text processing? In simple words, it is the process of transforming text into a form that could be digested by machine learning algorithms.

Well, we are not going to talk about the entire text processing process here, but rather we will discuss and utilize a pre-built library that deals with some of the preliminary and most important steps required in text processing, and that is text cleaning and generating insights from text.

You might have noticed, whenever we have to deal with unstructured text data, we basically end up writing the same piece of code every time. Wouldn’t it be great if we have an architecture or a library that could deal with the mundane task of text cleaning and feature extraction for us?

“textslack” is one such architecture that I built that cleans your text data and also includes additional functionalities to extract useful insights from the text.

So, let’s just jump right into it, shall we?

First things first, where do we get this package from? Well, like most of the python packages available open-source, it’s just a pip install away.

So, here we go…

Tada… it’s installed in no time.

Now, let’s move on to the part where we import this library in our notebook or IDE, whatever you prefer. I usually prefer a notebook for initial analysis before utilizing the code for the main project.

And, it’s done, ignore the future warnings though.

Some of you, however, may face a few of the nltk download errors while importing the library.

Nothing to worry though, you just need to import nltk, download the missing resource, and import textslack again, respectively in that order. I will, however, include these resources in the next version release.

Now, that we have imported the library, let’s give it a try on some random text data.

For that, we just need to create an instance of the TextSlack module that we imported from textslack and call the transform method.

Easy, isn’t it? The text has been normalized and cleaned of all the stop words, punctuations, etc. in a single function call.

But, that’s just text cleaning, what about the other features that I talked about earlier? Don’t worry! we are gonna cover everything that textslack has to offer.

What if, we were only interested in the nouns, or verbs, or adjectives, present in a text? Let’s give that thought a try and try to clean the text off everything, but adverbs.

Hmm… seems like we only have one adverb in our text.

Anyway, we can similarly filter out the text for nouns, verbs, and adjectives by calling extract_nouns, extract_verbs, and extract_adjectives functions respectively.

So, now we have seen everything textslack has to offer in terms of cleaning, let’s see what other insights can we extract.

What about the overall sentiment of the text? Well, let’s see.

We got an overall negative sentiment, which is somehow justified because of the word “hate”, which has a really high sentiment polarity, present in the text.

Now, coming on to the final feature of the library, let’s see how many times the word “movie” has been mentioned in the text.

To avoid any confusion, for now, it’s simply giving us the frequency of a word in its mentioned form, not the frequency of the word in its base form. So, it won’t count if the word “movies” was present in the text. I know, it’s not exactly what we were looking for. But, this will be taken care of in the next version release.

So, yeah that was all about the basic functionalities of textslack, keep checking for new features in future version updates.

Thank you for your attention, I hope, you all will give it a try, and hopefully, this library will be helpful in dealing with your text analysis.

Library link:

https://pypi.org/project/textslack/

References:

https://towardsdatascience.com/text-cleaning-methods-for-natural-language-processing-f2fc1796e8c7

--

--