22 new stopword languages - 54 in total

Espen Klem
norch
Published in
2 min readMar 20, 2020

Yay! We’re really happy to support stopword removal for 54 languages. We’ve added 22 from stopwords-json and feels it is feature complete enough to deserve a bump to version 1.0.0.

Parts of the webpack’ed browser version.

The languages supported from today

Existing languages

From before we had Afrikaans, Modern Standard Arabic, Bengali, Danish, German, English, Spanish, Farsi, Finnish, French, Hausa, Hebrew, Hindi, Indonesian, Italian, Japanese, Lugbara, Dutch, Norwegian, Polish, Portuguese, Brazilian Portuguese, Punjabi Gurmukhi, Russian, Somali, Sotho, Swedish, Swahili, Vietnamese, Yoruba, Chinese Simplified and Zulu.

New languages

The new languages added are: Armenian, Basque, Breton, Bulgarian, Catalan, Croatian, Czech, Esperanto, Estonian, Galician, Greek, Hungarian, Indonesian, Irish, Korean, Latin, Latvian, Marathi, Romanian, Slovak (Slovakian), Slovenian, Thai and Turkish.

Nice to see it is used

Every week we see that new packages includes the stopword module as part of their dependencies, 744 in total on GitHub now, and hopefully many more to come. And from npmjs.com it is installed a little under 7000 times per week, growing steadily from 0 in 2015. It’s easy to use both in Node.js and in the browser.

More flexible future?

We’re looking into the possibility to add list of custom stopwords to one of the pre-generated stopword list you are using. Hopefully it will be backwards compatible, but more about that an other time.

So for now: Happy stopword removal, and hope the new version suits you well. Shout out if you have any ideas or issues with the module.

--

--

Espen Klem
norch
Editor for

Designing - Creating - Dismantling - Socialising - Nerding. Interaction Designer at Knowit. Tinkering with search when I can.