Stopword lists for Hebrew and Swahili

Espen Klem
Published in
1 min readMar 7, 2019

Stuff is moving along nicely with the stopword module. Some weeks ago, somebody created a Swahili stopword list with the stopword-trainer, and today we got a pull request containing a Hebrew stopword-list. Two more down in the A stopword list for any language in the world-effort.

Meaningless words in Swahili - No more!

Next will be to add a lot more African stopword lists. And then I’m open for suggestions.

Hebrew ready for stopword removal.

So now we have lists of stopwords for these languages:

  • ar - Modern Standard Arabic
  • bn - Bengali
  • br - Brazilian Portuguese
  • da - Danish
  • de - German
  • en - English
  • es - Spanish
  • fa - Farsi
  • fr - French
  • he - Hebrew
  • hi - Hindi
  • it - Italian
  • ja - Japanese
  • nl - Dutch
  • no - Norwegian
  • pl - Polish
  • pt - Portuguese
  • pa_in - Punjabi Gurmukhi
  • ru - Russian
  • sv - Swedish
  • sw - Swahili
  • zh - Chinese Simplified

Fire off a pull request if you’ve created a list or create an issue if you need help creating one.



Espen Klem
Editor for

Designing - Creating - Dismantling - Socialising - Nerding. Interaction Designer at Knowit. Tinkering with search when I can.