Using WikiData API to get the sense of words

Photo by Patrick Tomasso on Unsplash

In one of our current projects, we needed to get the main title of some entity. E.g. if we have “US” we need to know that this is “United States of America”.

Such a problem can be represented as an NLP (Natural Language Processing) problem. There was a possible solution like training some Word2Vec neural network model to make the semantics analysis of those words and find the semantic similarity of the words.

However, it pulls along problems like collecting and preparing a dataset, looking for the best model, training it and comparing the accuracy of different approaches.

In modern words, when everything goes very fast, especially the development process — sometimes you just don’t have enough time to do all that stuff. It is an important skill for a data scientist or even a developer to look and find an alternative which is a simpler and faster way to solve the problem.

And free knowledge base with 53,282,301 data items that is constantly expanding sounds like the easier and more stable solution. After all, there is not any artificial neural network that is better than human brains.

Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others.

So, like any other Wikimedia project, it has an API that we can easily use. Especially, we interested in wbsearchentitiesaction. That endpoint provides us a label and description for the entity.

You can check the example below:


Thanks for reading! Hope you’ll find it helpful.

German Gensetskiy under the support of Go Wombat Team.
Special thanks to
Stas Dragun for helping me finding Wikidata.