[Week 5 — NLP Zemberek]

Sentiment Analyzer
bbm406f16
Published in
2 min readJan 5, 2017

Zemberek is an open source Turkish language linguistic processing library. The fully Java-developed library has features such as spell checking, suggestions for incorrect words, hyphenation, deasciifier, and erroneous code clearing.

After we collected reviews from the yemeksepeti.com website for our project, we used the Zemberek library to correct spelling mistakes and find the roots of the words in these reviews.

With another java program that we wrote, we used the Zemberek library and mostly corrected the spelling mistakes in reviews. To use the Zemberek library, we downloaded the two .jar files and added them to our program.

zemberek-cekirdek-2.1.1.jar : This file contains the main functions of the Zemberek library. (http://www.java2s.com/Code/Jar/z/Downloadzemberekcekirdek211jar.htm)

zemberek-tr-2.1.1.jar : This file contains information and classes related to Turkey Turkish. (http://www.java2s.com/Code/Jar/z/Downloadzemberektr211jar.htm)

To add related libraries and create a Zemberek object in Turkey Turkish:

To return possible Turkish equivalents of the ASCII Turkish string in a string array:

String[] net::zemberek::erisim::Zemberek::asciidenTurkceye(String giris)

To return : If the word is corrupted, it returns the correct word probabilities that resemble this word. Currently;

  • 1 letter incomplete
  • 1 letter excess
  • 1 incorrect letter usage
  • Side-by-side misplacement

String[] net::zemberek::erisim::Zemberek::oner(String giris)

To find the root of word:

  • The kelimeCozumle method returns the Kelime class list.
  • There is a resolved root of word by the kok method. This method returns a Kok object.
  • The icerik method is used to obtain the value in the object Kok as a string.

References

What is the Zemberek?

Use of the Zemberek:

Zemberek Documentation:

Finding Root of the Words with Zemberek:

--

--