Synonyms and antonyms are very useful in construction of Knowledge Graphs (KGs) and general Natural Language Understanding/Processing (NLU/NLP). However, identifying accurate synonyms and antonyms programmatically and with friendly open source licenses is more difficult than it should be. For that reason I released a mechanism for harvesting synonyms and antonyms based on open source dictionaries.
Since no single tool in this domain is ever complete on its own, let’s look at another approach for harvesting synonyms and antonyms through WordNet. WordNet provides a lot of methods that allow you to explore relationships like synonyms, antonyms, hypernyms/hyponyms, usage domains, and many more. However, some elbow grease is required. For the purpose of this article I will focus on synonyms and antonyms and hopefully I will do most the heavy lifting for you. Beware WordNet data is not always complete and can sometimes contain profanity.
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets (synset is short for set of synonyms). WordNet can be seen as a combination of dictionary and thesaurus. An example of a synset is amphibian.n.03 . This is an entry for the word amphibian with noun part of speech. The last part, 03, is an entry count; kind of an entry id, even though the whole string amphibian.n.03 is the entry id. For the purpose of this article, I will ignore part of speech for now. A synset can contain multiple entries each referred to as a lemma. A lemma represents the canonical form, for example the word banks would have bank as a lemma.
Let’s look at some synsets and lemmas for the words amphibian and discourage:
>>>wn.synsets(“amphibian”) [Synset('amphibian.n.01'), Synset('amphibian.n.02'), Synset('amphibian.n.03'), Synset('amphibious.a.01')]
>>>wn.lemmas(“amphibian”) [Lemma('amphibian.n.01.amphibian'), Lemma('amphibian.n.02.amphibian'), Lemma('amphibian.n.03.amphibian'), Lemma('amphibious.a.01.amphibian')]
>>>wn.synsets("discourage") [Synset('deter.v.01'), Synset('discourage.v.02'), Synset('warn.v.02')]
>>>wn.lemmas("discourage") [Lemma('deter.v.01.discourage'), Lemma('discourage.v.02.discourage'), Lemma('warn.v.02.discourage')]
Notice for amphibian the word itself appears in each synset and each lemma. This is an indication that WordNet has no synonyms for this word. On the other hand discourage has different words in the synsets, which is a good indication that we will find synonyms and possibly antonyms for this word.
I created a function that will do all the heavy lifting and return an array of synonyms and an array of antonyms, if found. With no further ado, to the code (but not beyond):
def get_word_synonyms_and_antonyms_from_wordnet(word, min_acceptable_reputation=1): synonyms =  antonyms =  word_synsets = wn.synsets(word) for syn in word_synsets: #keep track if there is a synset entry with desired min desired reputation synset_has_reputable_lemmas = False for lemma in syn.lemmas(): #check the reputation of this lemma, if not good enough, skip it if lemma.count() < min_acceptable_reputation: continue #if we got this far, then this lemma is reputable synset_has_reputable_lemmas = True #Ensure the synonym entry (lemma) is different from given word, otherwise its useless if lemma.name() != word: synonyms.append(lemma.name().encode('ascii', 'ignore').replace("_", " "))if lemma.antonyms(): #if the antonym is the same as the word, then it is not of value if lemma.antonyms().name() != word: #antonyms is an array, however, I have never seen it have more than one element antonyms.append(lemma.antonyms().name().encode('ascii', 'ignore').replace("_", " "))#if the synset has at least one reputable lemma, then try to extract other synonyms from it if synset_has_reputable_lemmas: #In some cases the name of the synset is different from the name of the raw word itself, #in that case it is a good match as a synonym. syn_name = syn.name().encode('ascii', 'ignore') #only need the first part of the string because it will be of the form <word>.<pos>.<number> syn_name = syn_name.split(".") if syn_name != word: synonyms.append(syn_name.replace("_", " ")) synonyms = list(set(synonyms)) antonyms = list(set(antonyms))return synonyms, antonyms
Let’s look at a few examples of running this function against some words. Remember the first array is synonyms and the second is antonyms.
>>>get_word_synonyms_and_antonyms_from_wordnet("possible") (['potential'], ['impossible', 'actual'])
>>>get_word_synonyms_and_antonyms_from_wordnet("pride") (['congratulate'], ['humility'])
>>>get_word_synonyms_and_antonyms_from_wordnet("proceed") (['go on', 'move', 'go along', 'keep', 'go', 'continue', 'go forward', 'carry on'], ['discontinue'])
>>>get_word_synonyms_and_antonyms_from_wordnet("arachnid") (, )
>>>get_word_synonyms_and_antonyms_from_wordnet("recommend") (['commend', 'urge', 'advocate'], )
>>>get_word_synonyms_and_antonyms_from_wordnet("hunger") (['starve', 'crave', 'thirst'], ['be full'])
>>>get_word_synonyms_and_antonyms_from_wordnet("hungry") (['athirst'], ['thirsty'])
- Some words naturally do not have antonyms like arachnid. However, other words like recommend just don’t have enough information in WordNet.
- You would think that hunger and hungry are related, but what is shown above disproves that. I would have thought that hunger is the root/stem of hungry, but that does not seem to be the case.
- Some of the entries are a little strange, for example, thirsty or thirst is neither a synonym or antonym of hunger or hungry. I can see it related, which I suppose makes it closer to be a synonym, but definitely not an antonym.
And there you have it. You will get synonyms and antonyms, albeit limited, but still quite useful. In future articles I will cover some interesting use cases for the generated synonyms and antonyms as well as other vocabulary word expansion techniques.
Connect with me on twitter @tearoks and share your thoughts.