Grakn Python Driver & How to roll your own

Published in

Vaticle

8 min readAug 14, 2018

Updated 8th February 2019

Grakn 1.3 was released in August 2018 with a slew of new features, bug fixes, and performance enhancements. Included in this release were new gRPC-based drivers for Java, NodeJS and Python. This blog post will walk you through the Python driver, and provide guidelines on how you can write your own for your language of choice.

Overview

The main reason for rewriting our drivers was a move from REST to gRPC in Grakn. This change has cleaned up our API and should provide performance benefits. Further, all of our available drivers (Java, Node, and Python) now expose the same objects and methods to users, subject to language naming conventions and available types. To maintain this uniformity across the stack, new language drivers should provide the same interface. Note that you will require both gRPC and protobuf support to create a functioning driver, so double check a) that compilers for your language exist, and b) your target language version is compatible with the compiler.

Driver Architecture

We can divide our drivers into 5 user-facing components:

Grakn- the driver entry point, instantiated with a URI and optionally credentials for the Grakn instance, from which we create Sessions.
Session- A connection to a keyspace within the instance, from which we create Transactions.
Transaction- A single database transaction which may be used to query, close, commit, etc.
Concept- An object representing any database entity (hierarchy of subtypes diagrammed below)

                            Concept
                           /        \
                         /            \
                       /                \
                     /                    \
        SchemaConcept                      Thing
         /      |    \                    /  |   \
        /       |     \                 /    |     \
       /        |      \              /      |       \
    Rule      Type       Role     Entity   Attribute   Relationship
            /  |   \
           /   |     \
          /    |       \
         /     |         \
EntityType  AttributeType  RelationshipType

Answer- The result from string queries submitted to the server (and subtypes)

Everything else can be regarded as machinery to make this interface functional.

├── ...
├── grakn
│   ├── __init__.py  -----  # Entry - Grakn, Session, Transaction
│   ├── exception
│   │   └── GraknError.py
│   └── service
│       ├── Keyspace
│       │   ├── KeyspaceService.py
│       │   └── autogenerated -----  # Autogenerated gRPC files
│       │       ├── Keyspace_pb2.py
│       │       └── Keyspace_pb2_grpc.py
│       └── Session
│           ├── TransactionService.py
│           ├── Concept
│           │   ├── BaseTypeMapping.py
│           │   ├── Concept.py  -----  # Concept object hierarchy
│           │   └── ConceptFactory.py
│           ├── autogenerated -----  # Autogenerated gRPC files
│           │   ├── Answer_pb2.py
│           │   ├── Answer_pb2_grpc.py
│           │   ├── Concept_pb2.py
│           │   ├── Concept_pb2_grpc.py
│           │   ├── Session_pb2.py
│           │   └── Session_pb2_grpc.py
│           └── util
│               ├── RequestBuilder.py  ----- # Build gRPC requests
│               ├── ResponseReader.py  ----- # Unpack gRPC responses
│               └── enums.py
└── tests
    ├── integration
        ├── test_concept.py
        ├── test_grakn.py
        └── test_keyspace.py

The above structure is generated from https://github.com/graknlabs/client-python. Roughly, we have Grakn, Session, and Transaction exposed in the top level package’s __init__.py, with gRPC-specific implementation contained in the service sub-package. A TransactionService utilizes the RequestBuilder (which creates required gRPC messages), the Communicator (that wraps a bi-directional gRPC stream, exposing a one-in one-out server connection), and the ResponseReader (which converts received gRPC messages into local Python objects). Received objects may be a subtype of Concept, or an Answer subtype. I recommend reading the README and glancing at the code to see what each of these objects exposes.

gRPC Summary

Key takeaways about gRPC are that it is HTTP 2.0 based, supports bidirectional streaming, defines services and messages using protocol buffer syntax and definitions, and can be compiled to a variety of language stubs which are interfaced to on both the server and client side.

We don’t use much of the advanced protocol buffer functionality like channel multiplexing, instead focusing on core RPC functionality and complex, strongly typed messages.

For a slightly longer gRPC introduction, I recommend this.

Our gRPC protocol is defined at https://github.com/graknlabs/grakn/tree/master/protocol. In the sub-directories, we have four .proto files: the gRPC entry point for all Transaction operations is in Session.proto, Keyspace operations are in Keyspace.proto, etc.

Understanding our gRPC Protocol

The key to implementing a Grakn driver successfully will be understanding how to create and unpack the correct gRPC messages. Many of the methods exposed to users on Concepts, (e.g. an AttributeType from the hierarchy above) are really RPC calls to the Grakn server. To pick a simple example, when calling an attribute_type.create(), we create a gRPC request to the Grakn instance, which creates an instance of a person and returns this instance via another gRPC message. The returned message is unpacked and presented to the user as an instance of the Attribute class.

To become familiar with our RPC message formats, we can look at the protobuf definition files found under

Here’s an excerpt from Session.proto:

service SessionService {
    rpc transaction (stream Transaction.Req) returns (stream Transaction.Res);
}message Transaction {
    message Req {
        oneof req {
            Open.Req open_req = 1;
            Commit.Req commit_req = 2;
            Query.Req query_req = 3;
            Iter.Req iterate_req = 4;
            GetSchemaConcept.Req getSchemaConcept_req = 5;
            GetConcept.Req getConcept_req = 6;
            GetAttributes.Req getAttributes_req = 7;
            ...
        }
    }
message Res {
    oneof res {
        Open.Res open_res = 1;
        Commit.Res commit_res = 2;
        Query.Iter query_iter = 3;
        Iter.Res iterate_res = 4;
        GetSchemaConcept.Res getSchemaConcept_res = 5;
        GetConcept.Res getConcept_res = 6;
        GetAttributes.Iter getAttributes_iter = 7;
        ...
    }
}message Iter {
    message Req {
        int32 id = 1;
    }
    message Res {
        oneof res {
            bool done = 1;
            Query.Iter.Res query_iter_res = 2;
            GetAttributes.Iter.Res getAttributes_iter_res = 3;
            Method.Iter.Res conceptMethod_iter_res = 4;
        }
    }
}
...
message GetAttributes {
    message Req {
        ValueObject value = 1;
    }
    message Iter {
        int32 id = 1;
        message Res {
            Concept attribute = 1;
        }
    }
}

Our main RPC endpoint is the single RPC call named transaction. In practice, we use this endpoint as a bidirectional stream. Because the protobuf messages are typed, we can walk through the protobuf file definition to see how to build messages we need. To understand what exactly this means, I’ll walk through a more advanced example.

Get Attributes By Value

I’m going to break down the messages sent by the following piece of Python code:

# make sure you've run `pip3 install grakn` and have Grakn running
client = grakn.Grakn(uri="localhost:48555")
with client.session(keyspace="test") as session:
    with session.transaction(grakn.TxType.READ) as tx:   
        iter = tx.get_attributes_by_value(“John”,     
                                           grakn.DataType.STRING)

Here, we want to retrieve all the attributes that have string values called “John”. The first gRPC message created is a Transaction.Req from Session.proto , which needs to have getAttributes_req field populated. This, in turn has the type GetAttributes.Req , which has a single field called value . This in turn is a ValueObject , which is defined in the Concept.proto file (exercept below):

message Concept {
    string id = 1;
    BASE_TYPE baseType = 2;    enum BASE_TYPE {
        ...
        ATTRIBUTE_TYPE = 3;
        ...
    }
    ...
}message ValueObject {
    oneof value {
        string string = 1;
        bool boolean = 2;
        int32 integer = 3;
        int64 long = 4;
        float float = 5;
        double double = 6;
        int64 date = 7; // time since epoch in milliseconds
    }
}

In this case, the ValueObject needs to the string field populated with “John”.

Phew! In Python, printing the final message to a string we should get something that looks roughly like this:

{                            # type Transaction.Req
  getAttributes_req {        # type GetAttributes.Req
    value {                  # type ValueObject (from Concept.proto)
      string : "John"
    }
  }
}

gRPC implementations differ here in how to actually compose these messages together: for instance, in python, each of these compound messages needs to be instantiated and embedded using CopyFrom or MergeFrom (Python Protobuf docs).

The message that is returned will be a Transaction.Req . But which field will be populated? You can get this from our naming conventions: It should be the one with type GetAttributes.Iter . This message will have a single field called id .

{                            # type Transaction.Res
  getAttributes_iter {       # type GetAttributes.iter
    id: 1
  }
}

Iterating

How is this useful?

Well, the id returned represents an iterator on the server, which we can repeatedly request to retrieve the actual Attribute instances. This can be wrapped up on the client side as a local iterator. In Python, we then retrieve the next element in an iterator by calling next(attribute_iterator)

...
with client.session(keyspace="test") as session:
    with session.transaction(grakn.TxType.READ) as tx:   
        attribute_iterator = tx.get_attributes_by_value(...)
        attr = next(attribute_iterator)

The next(attribute_iterator) needs create a new gRPC message with the following format:

{                            # type Transaction.Req
  iterate_req {              # type Iter.Req
    id : 1                   # or whatever the iterator ID is
  }
}

Which returns

{                              # type Transaction.Req
  iterate_res {                # type Iter.Res
    getAttributes_iter_res {   # type GetAttributes.Iter.Res
      concept {                # Type Concept (from Concept.proto)
        id : "VS...",
        baseType: 3
    }      
  }
}

Finally, we have the first actual Concept definition, although it has arrived as a gRPC message. We can unpack the id and baseType into local objects and present them to the user.

The next time next(attribute_iterator) is called, we repeat the process of making an iterate_req and unpacking the returned message into a local object.

Tips

I thought I’d take a moment to write out some of the hurdles and solutions I came across when implementing the Python driver.

Circular Dependencies

Unless you create a monolithic driver, you’re more than likely to split your code into several modules that will have circular dependencies. Intuitively, local Concept objects may access the server and create other Concepts. Thus, Concepts depend on a networking component which depends on Concept, a stateful circular dependency.

For example, in the Python driver, Concept uses the TransactionServiceto access properties on Grakn. Requests come back and are converted by ResponseReader, which takes gRPC messages from the server and returns, among other things, instances of Concepts.

All of our drivers have faced this issue and worked around it in different ways: Node uses dependency injection (instantiate the circular dependencies at an earlier point and then assign them into each other), Java lumps together much of the dependent functionality (and actually has a few circular imports), and Python allows circular imports as long as you follow certain import styles.

Compiling and importing gRPC/protobufs

Each supported language has its own compiler. The Python compiler is called grpc_tools.protoc. You may run into problems importing the resulting modules into your programs (this was a major pain point in Python), because the packages declared in the .proto files don’t match the target folder structure. Our first solution was to use a Makefile which temporarily creates the target folder structure, copy+update .proto files, run protoc, and delete the copied proto files. The current solution actually replaces the incorrect package name with the correct one during our Bazel build and distribution process. Try to avoid symlinks or long-lived copies of the protocol definition files.

Tests

Test are an important part of our drivers! Especially when these are the main entry point to using Grakn, we want to ensure as much correctness as we can. Luckily, any new drivers can more or less copy the tests from our Python or Node tests and modify them to suit your language’s framework and test style.

Good Luck :)

We hope this post both illuminates the new Python driver, and acts as a guide for implementing your own language’s client for Grakn! If you have any questions at all, want to collaborate, or just say Hi, join our community Slack, or email me at joshua at grakn dot ai.