treetojson — Simple Python Utility Library to map tree structure to JSON

Available in Python Package Index

Saad Sahibjan
5 min readDec 30, 2017

treetojson is a utility library in converting a given tree structure into a valid JSON object. This library works handy with the usage of Part-of-Speech Tagger that reads text in some language and assigns parts of speech to each word such as noun, verb, adjective etc. This library is pure Python code with a single dependency of NLTK. This can also be used along with NLTK RegexpParser.

The output from the treetojson was taken manipulated and represented in the front-end as below.

What is a Tree Structure in this context?

When a sentence is given as an input along with its tags, it is represented in a structure which looks like a tree. Structure of the tree depends on the sentence given and the tags of each word in the sentence. These trees are also called as parse trees. In this context, the leaves of a parse tree are word
tokens; and the node values are phrasal categories, such as NP, VP etc.

Simple tree structure would look like below along with its tags

Sentence: Everyone knows an Elephant is larger than a Dog
(S
(NP Everyone/NN)
(VERB knows/VBZ)
(COMP
an/DT
(NP Elephant/NN)
(VERB is/VBZ)
(CP larger/JJR)
(THAN than/IN)
a/DT
(NP Dog/NN)))

Few tags used in English Language (tags are not fixed)

S -Sentence
NN -Noun
VB -Verb
D -Determiner
NP -Noun Phrase
VP -Verb Phrase

Why this library was developed?

A simple sentence along with its tags will look like below,

sentence = [('Everyone', 'NN'), ('knows', 'VBZ'), ('an', 'DT'), ('Elephant', 'NN'), ('is', 'VBZ'), ('larger', 'JJR'), ('than', 'IN'), ('a', 'DT'), ('Dog', 'NN')]

Since this is related with the grammar there will always be repetitive tags as there can be multiple nouns ‘NN’, verbs ‘VB’ etc. With the use of already existing approaches if a JSON object to be generated the following issues will take place.

  • Order of the words changes
  • Loss of certain words.

The order of the words changes because a single key containing all its values. In the above sentence there are multiple values for NN, VBZ and DT. So it will looks like as shown below.

{
“NN”: [“Everyone”, “Elephant”, “Dog”],
“VBZ”: [“knows”, “is”],
“DT”: [“an”, “a”],
“JJR”: [“larger”],
“IN”: [“than”]
}

If its represented as above the order of the word changes. If order changes a proper representation cannot be interpreted.

Loss of certain words takes place where the key contains the last value only. In the above sentence there are multiple values for NN, VBZ and DT. So it will looks like below.

{
“NN”: “Dog”,
“VBZ”: “is”,
“DT”: “a”,
“JJR”: “larger”,
“IN”: “than”
}

If it is represented as above there will be loss of words and representation of the sentence cannot be done.

To overcome the mentioned two issue treetojson utility library can be used and the output obtained will be as below.

{
“SENTENCE”:[
{ “NN”:”Everyone” },
{ “VBZ”:”knows” },
{ “DT”:”an” },
{ “NN”:”Elephant” },
{ “VBZ”:”is” },
{ “JJR”:”larger” },
{ “IN”:”than” },
{ “DT”:”a” },
{ “NN”:”Dog” }
]}

In this case with the use of treetojson, can overcome both the issues of change in order of words and loss of words.

How to install treetojson?

This library can be simply installed using pip or easy_install

pip install treetojson

or

easy_install treetojson

Basic Usage

  1. When a list containing words and it’s appropriate tags are provided as follow:
>>> import treetojson
>>> sentence = [('Everyone', 'NN'), ('knows', 'VBZ'), ('an', 'DT'), ('Elephant', 'NN'), ('is', 'VBZ'), ('larger', 'JJR'),
('than', 'IN'), ('a', 'DT'), ('Dog', 'NN')]
>>> print treetojson.get_json(data=sentence)
{"SENTENCE":[{"NN":"Everyone"},{"VBZ":"knows"},{"DT":"an"},{"NN":"Elephant"},{"VBZ":"is"}, {"JJR":"larger"},
{"IN":"than"},{"DT":"a"},{"NN":"Dog"}]}

2. When a list containing words with appropriate tags along with a grammar is provided:

>>> import treetojson
>>> sentence = [('Everyone', 'NN'), ('knows', 'VBZ'), ('an', 'DT'), ('Elephant', 'NN'), ('is', 'VBZ'), ('larger', 'JJR'),
('than', 'IN'), ('a', 'DT'), ('Dog', 'NN')]
>>> grammar = """
NP: {<PRP>?<JJ.*>*<NN.*>+}
CP: {<JJR|JJS>}
VERB: {<VB.*>}
THAN: {<IN>}
COMP: {<DT>?<NP><RB>?<VERB><DT>?<CP><THAN><DT>?<NP>}
"""
>>> print treetojson.get_json(data=sentence, grammar=grammar)
{"SENTENCE":[{"NP":[{"NN":"Everyone"}]},{"VERB":[{"VBZ":"knows"}]},{"COMP": [{"DT":"an"},{"NP":[{"NN":"Elephant"}]},
{"VERB":[{"VBZ":"is"}]},{"CP":[{"JJR":"larger"}]},{"THAN":[{"IN":"than"}]},{"DT":"a"},{"NP":[{"NN":"Dog"}]}]}]}

3. When words and labels or tags are separately provided:

>>> import treetojson
>>> words = ['Everyone', 'knows', 'an', 'Elephant', 'is', 'larger', 'than', 'a', 'Dog']
>>> labels = ['NN', 'VBZ', 'DT', 'NN', 'VBZ', 'JJR', 'IN', 'DT', 'NN']
>>> print treetojson.get_json(words=words, label=labels)
{"SENTENCE":[{"NN":"Everyone"},{"VBZ":"knows"},{"DT":"an"},{"NN":"Elephant"},{"VBZ":"is"},{"JJR":"larger"},{"IN":"than"},
{"DT":"a"},{"NN":"Dog"}]}

4. When words and labels or tags separately along with a grammar is provided:

>>> import treetojson
>>> words = ['Everyone', 'knows', 'an', 'Elephant', 'is', 'larger', 'than', 'a', 'Dog']
>>> labels = ['NN', 'VBZ', 'DT', 'NN', 'VBZ', 'JJR', 'IN', 'DT', 'NN']
>>> grammar = """
NP: {<PRP>?<JJ.*>*<NN.*>+}
CP: {<JJR|JJS>}
VERB: {<VB.*>}
THAN: {<IN>}
COMP: {<DT>?<NP><RB>?<VERB><DT>?<CP><THAN><DT>?<NP>}
"""
>>> print treetojson.get_json(words=words, label=labels, grammar=grammar)
{"SENTENCE":[{"NP":[{"NN":"Everyone"}]},{"VERB":[{"VBZ":"knows"}]},{"COMP": [{"DT":"an"},{"NP":[{"NN":"Elephant"}]},
{"VERB":[{"VBZ":"is"}]},{"CP":[{"JJR":"larger"}]},{"THAN":[{"IN":"than"}]},{"DT":"a"},{"NP":[{"NN":"Dog"}]}]}]

Example with an API call

I created and exposed a simple API resource to return the JSON which is created out of treetojson library.

In the above code snippet a simple local HTTP server is created with the use of CherryPy python web framework and an GET API resource is exposed. The words, labels and grammar are provided to treetojson library and the response is returned.

The API was accessed through Postman via http://localhost/v1/api

Response from the local server

The response for the request can be seen in the above image which has returned a valid JSON.

Summary

treetojson is a utility library which takes words and specific tags of those words as input and provides a valid JSON object as output. This library maintains the order of the words provided and doesn’t lose any words. The output given will be a readable sentence as its given to treetojson.

The output given by treetojson can be a respond to a HTTP request, and can be used, manipulated and displayed appropriately in the front-end.

Any issues encountered during the usage of this library can be raised under GitHub issues.

--

--