Documentation for python-hydra-agent

A GSOC-2018 story

Hello there! First phase of coding of GSoC 2018 is almost completed. I am working on project “Implement Redis as a datastore” under the Python Hydra organisation. For the first phase of coding my major task was to build a graph structure representing content in a data API and store it in Redis with the help of redisgraph python client. Here is the source code for the graph implementation that I have written in the first phase.

Explaining the code

There are three modules or files (`hydra_graph`, `collections_endpoint`, `classes_objects`) which are responsible for graph structure and its storage in Redis. The file `hydra_graph` is the main file and other two work as modules. First of all we have setup the connection to Redis as `redis_con` and set graph name in Redis “apidoc” (the entrypoint documentation in Hydra is called “ApiDoc”) and use it in file as redis_graph.

if __name__ == “__main__”:
redis_con = redis.Redis(host=’localhost’, port=6379)
redis_graph = Graph(“apidoc”, redis_con)

Next we pass a url for and Hydra server and we find the API documentation for the given server, then we load its data into the apidoc graph. Internally we get the entrypoint with help of `hydrus.hydaraspec.doc_maker`; this generates a node for the entrypoint in our graph and set its properties like “id”, “url”, and “supportedOperation” as entrypoint_properties in the node.

def get_endpoints(api_doc):
"""Create node for entrypoint"""
..
..
entrypoint_node = Node(
label="id",
alias="Entrypoint",
properties=entrypoint_properties)
redis_graph.add_node(entrypoint_node)
return get_apistructure(entrypoint_node, api_doc)

Now we move to the part get_apistructure. Here we’ll find the two types of endpoints `collection_endpoints` and `class_endpoints` with the help of `hydrus.hydraspec.doc_writer.EntryPointCollection` and `hydrus.hydraspec.doc_writer.EntryPointClass` respectively.

def get_apistructure(entrypoint_node, api_doc):
""" It breaks the endpoint into two parts collection and classes"""
...
...
...
class_endpoints[support_property.name] = support_property.id_
...
collection_endpoints[support_property.name] = support_property.id_
...
...

In the above code, we are splitting the endpoints and here we are only storing endpoint’s name and its id. After that we call the functions in their respective modules like for `class_endpoint` where we have to call and pass it in the `classes_objects` module, the same for `collection_endpoint` in `collection_endpoints` module but first we call `class_endpoints` or `classes_objects.ClassEndpoints.endpointclasses()`.

"""
classes_objects.ClassEndpoint
"""
def endpointclasses(
self,
class_endpoints,
entrypoint_node,
api_doc,
base_url):
"""Node for every class which have an endpoint."""
...
for endpoint in class_endpoints:
...
new_url = base_url + \
class_endpoints[endpoint].replace("vocab:EntryPoint","")
.....
node_properties["operations"] = self.get_operation(
api_doc, endpoint)
node_properties["@id"] = str(new_file["@id"])
node_properties["@type"] = str(new_file["@type"])
node_properties["property_value"] = str(member)
node_properties["properties"]= str(supported_properties_list)
.....
...

In the code we are creating a node for every endpoint stored in `class_endpoint` and the properties for node are id, type, properties and `property_value`. For every endpoint we are creating a url as new_url which help us in fetching the data from the server endpoint and here we are storing the endpoint data in new_file.
For every endpoint, the `node_properties[“operations”]` part contain all the operation which can be done on that endpoint and for this to work, we have an function `get_operation` which returns all the operations. The `node_properties[“properties”]` part contains a list of all the `supportedProperty` associated with the endpoint as `supported_properties_list and node_properties[“property_value”]` part contains the value also for all those properties which are not an object or endpoint itself. Yes, it can be happen any endpoint can contain another endpoint or non-endpoint object as its `supportedProperty`.

So, for that type of endpoints that are referenced by a property we should have an edge between the endpoints (being careful at the to check that the target endpoint already exist as a node). For this we have a Python dictionary `endpoint_property_list` which keeps track of which endpoint holds which property and after creating all the endpoint nodes we can set an edge between them you can see the code for this here.

...
if endpoint_property_list:
for endpoint_property in endpoint_property_list:
for src_node in self.redis_graph.nodes.values():
if str(endpoint_property) == src_node.alias:
for endpoints in endpoint_property_list[
endpoint_property]:
for nodes in self.redis_graph.nodes.values():
if endpoints == nodes.alias:
self.addEdge(
src_node,
"has_endpoint_property",
nodes)
break
break

If the object of the relation is not an endpoint, we have to create a node for every entity type with its property and `property_value` and connect it with the endpoint. For whole this functionality we have defined a function called `objects_property` .

def objects_property(
self,
objects_node,
new_list,
no_endpoint_property,
entrypoint_node,
api_doc):
"""Nodes for every that property which is itself an object"""
...
...

In this function, we are creating a node for every non-endpoint object property and connect it to its parent endpoint. It is actually a recursive function because if object have again object as property in itself. So, these node can called as terminal node or leaf node for any endpoint.

And now, we have call `collection_endpoints` or `collection_endpoint.CollectionEndpoints.endpointCollection()`.

def endpointCollection(
self,
collection_endpoint,
entrypoint_node,
api_doc,
url):
"""It makes a node for every collection endpoint."""
...
for endpoint in collection_endpoint:
...
new_url = url + \
collection_endpoint[endpoint].replace("vocab:EntryPoint", "")
...
node_properties["@id"] =str(collection_endpoint[endpoint])
node_properties["operations"] = str(endpoint_method)
node_properties["members"] = str(new_file["members"])
...
...

As you can see in above code, for every endpoint stored in `collection_endpoint` we are creating a node with properties like id, operations, and members. Here you see that members are those endpoints which are contained by the endpoint. In `class_endpoint` there was only an endpoint but here in `collection_endpoint` we have collections of members for every endpoint. We can get the collection of endpoints or members for every endpoint by fetch the data from the server.

Now we have to create a node for every endpoint’s member, for this we have defined an function `collectionobjects`.

def collectionobjects(
self,
endpoint_collection_node,
endpoint_list,
new_url,
entrypoint_node,
api_doc,
url):
"""Creating nodes for all objects stored in collection."""
...
for endpoint in endpoint_list:
...
new_file1 = self.fetch_data(new_url1)
...
node_properties["operations"] = str(endpoint_method)
...
node_properties["@id"] = str(endpoint["@id"])
node_properties["@type"] = str(endpoint["@type"])
node_properties["property_value"] = str(member)
node_properties["properties"]=str(supported_property_list)
...
...

For every endpoint in collection, we have to fetch the data from the server endpoint and load it.
At this point we have data for the endpoint and we are using it in the properties of node. Every endpoint node have property for its operations, id, type, properties and `property_value`. It seems like same as class endpoint node’s properties because class endpoint is also only an endpoint object same as endpoint in collection endpoint.
Now we can do a similar operation in class’ endpoint for storing its properties, with the benefit that we have already created nodes for all classes’ endpoints already. If there exists any property endpoint then we can connect the endpoint to property directly without any consideration. That was the reason we call `class_endpoint` before `collection_endpoint`. Code for this is shown below:

if endpoint_property_list:
for endpoint_property in endpoint_property_list:
for nodes in self.redis_graph.nodes.values():
if endpoint_property == nodes.alias:
clas.addEdge(
collection_object_node,
"hasendpointproperty",
nodes)

Again if endpoint contains a non-endpoint object property then it will pass through the above function objects_property().

This way we can store whole data from the server in Redis in graphical manner. If you want to better understanding you should go thorough source code.

Recap

We store in Redis different types of node:

1- For `entrypoint_node`:

id   // vocab:EntryPoint
operations // Get
url // http://......../entrypoint

2- For `collection_endpoint_node`:

id   // id for collection_endpoint
operations // Get, put, post,...
members // endpoint which are collected in collection_endpoint

3- For endpoint part as `collection_endpoint`:

id  //same as id
operation //Get, put ...
type // type or name
properties // supportedProperties title
property_value // supported property title with value

4- For `class_endpoint`: same as for 3

5- For `object_node` :

parent_id  // id of parent endpoint
operations // Get,put...
property // supported property title
property_value //supported property title with value

Example demo for graph structure :-

Let server is http://35.224.198.158:8081/api and the graph generated for this with the help of graphviz is:

graph structure

Here are some other graphs that’s also generated by code for different data.

Thanks! Have a nice day.