Stemming With Sastrawi

Kal
2 min readMar 20, 2022

--

Using Sastrawi with Python

Sastrawi Python is a simple python library which allows you to reduce inflected words in Indonesian Language (Bahasa Indonesia) to their base form (stem). This is Python port of the original Sastrawi project written in PHP (credits goes to the original author and contributors of Sastrawi PHP).

How to Install

We can install Sastrawi Package using pip, run this code in the terminal

pip install PySastrawi

Stemming

Stemming is a natural language processing technique that lowers inflection in words to their root forms, hence aiding in the preprocessing of text, words, and documents for text normalization.

According to Wikipedia, inflection is the process through which a word is modified to communicate many grammatical categories, including tense, case, voice, aspect, person, number, gender, and mood. Thus, although a word may exist in several inflected forms, having multiple inflected forms inside the same text adds redundancy to the NLP process.

As a result, we employ stemming to reduce words to their basic form or stem, which may or may not be a legitimate word in the language.

For instance, the stem of these three words, connections, connected, connects, is “connect”. On the other hand, the root of trouble, troubled, and troubles is “troubl,” which is not a recognized word.

Why Stemming is Important?

The presence of words variances in a text corpus results in data redundancy when developing NLP or machine learning models. Such models may be ineffective.

To build a robust model, it is essential to normalize text by removing repetition and transforming words to their base form through stemming.

Stemming using Sastrawi

Import the Stemmer Factory Class

Create Stemmer

Sample Sentence

Sample Output

Conclusion

Sastrawi is easy to use and its really useful and helpful especially when we work with Unstructured Data (Text data) in Bahasa.

References

https://github.com/sastrawi/sastrawi

--

--