Entity Groups, Ancestors, and Indexes in Datastore- A Working Example

Authors: Theodore Siu, Kaitlin Ardiff

Quick Review: How Datastore stores your data

Before we jump right into Entity Groups in Datastore, it is important to first go over the basics and establish a common vocabulary. Datastore holds entities, which are objects, that can contain various key/value pairs, called properties. Each entity must contain a unique identifier, known as a key. When creating an entity, a user can choose to specify a custom key or let Datastore create a key. If a user decides to specify a custom key, it will contain two fields: a kind, which represents a category such as ‘Animal’ or ‘Occupation Status’, and a name, which is the identifying value. If a user decides to only specify a kind when creating a key, and does not specify a unique identifier, Datastore automatically generates an ID behind the scenes. Below is an example of a Python3 script which illustrates this identifier concept.

from google.cloud import datastore
client = datastore.Client()
#Custom key- specify my kind=item and unique_id = teddy_bear 
custom_key_ent = datastore.Entity(client.key("toy","teddy_bear"))
client.put(custom_key_ent)
#Only specify kind=item, let datastore generate unique_id
datastore_gen_key_ent = datastore.Entity(client.key("toy"))
client.put(datastore_gen_key_ent)

In your GCP Console under Datastore, you will then see your two entities of kind “toy”. One will contain your custom key and one will contain the automatically generated key.

Specify your custom key or datastore will generate a unique id for you

Ancestors and Entity Groups

For highly related or hierarchical data, Datastore allows entities to be stored in a parent/child relationship. This is known as an entity group or ancestor/descendent relationship.

This is an example of an entity group with kinds of types person, pet, and toy. The ‘Grandparent’ in this relationship is the ‘Person’. In order to configure this, one must first create the Person entity. Then, a user can create a pet, and specify that the parent is a person key. In order to create the ‘Grandchild’, a user then creates a toy and sets its parent to be a pet key. To further add customizable attributes, a user can specify additional key-value pairs such as age, sex, and type. These key-value pairs are stored as properties. We model this diagram in Datastore in our working example below.

One can create entity groups by setting the ‘parent’ parameter while creating an entity key for a child. This command adds the parent key to be part of the child entity key. The child’s key is represented as a tuple (‘parent_key’, ‘child_key’), such that the parents’ key is the prefix of the key, which is followed by its own unique identifier. For example, follow the diagram above:

person_key = client.key("Person","Lucy")
pet_key = client.key("Pet","Cherie", parent=person_key)

Printing the variable pet_key will display: ("Person", "Lucy","Pet", "Cherie") .

Datastore also supports chaining of parents, which can lead to very large keys for descendants with a long lineage of ancestors. Additionally, parents can have multiple children (representing a one-to-many relationship). However, there is no native support for entities to have multiple parents (representing a many-to-many relationship). Once you have configured this ancestral hierarchy, it is easy to retrieve all descendants for a given parent. You can do this by querying on the parent key by using the ‘ancestor’ parameter. For example given the entity pet_key created above, I can query for all of Cherie’s toys: my_query = client.query(kind="Toy", ancestor = pet_key) .

A Full Working Example

Here is a full working Python3 example of a person/pets/toys hierarchical model which uses entity groups modeling the diagram shown above.

from google.cloud import datastore
client = datastore.Client()

#Entities with kinds- person, pet, toy
my_entities = [
{"kind": "Person", "Person_id": "Lucy", "sex": "f","age": 18},
{"kind": "Pet", "Pet_id": "Cherie", "Person_id": "Lucy",
"sex": "f", "type": "dog", "age": 7},
{"kind": "Pet", "Pet_id": "Bubsy", "Person_id": "Lucy",
"sex": "m", "type": "fish", "age": 3},
{"kind": "Toy", "Toy_id": "tennis_ball", "Pet_id": "Cherie", "Person_id": "Lucy", "price": .99},
{"kind": "Toy", "Toy_id": "castle", "Pet_id": "Bubsy",
"Person_id": "Lucy", "price": 49.99},
{"kind": "Toy", "Toy_id": "rope", "Pet_id": "Cherie", "Person_id": "Lucy", "price": 10.99},
]
#Iterate through entities and set immediate parents
for entity in my_entities:
kind = entity['kind']
parent_key = None
if kind == "Pet":
parent_key = client.key("Person", entity["Person_id"])
elif kind == "Toy":
parent_key = client.key("Person", entity["Person_id"],
"Pet", entity["Pet_id"])
 key = client.key(kind, 
entity[kind+"_id"],
parent=parent_key) #Notice I set the parent key!!
datastore_ent = datastore.Entity(key)
datastore_ent.update(entity) #Include properties+id
client.put(datastore_ent)

If one wants to query for a specific pet- let’s say for instance we want to grab Cherie the dog. We could run a query:

query1 = client.query(kind="Pet")
query1.add_filter("Pet_id", "=", "Cherie")

Using the Cherie entity, we can then easily grab the parent’s id.

pet = list(query1.fetch())[0] # We know there is only one Cherie
print(“Cherie’s parent: “ + str(pet.key.parent.id_or_name))

Additionally, we can also grab the toys who are the direct children of Cherie using the ancestor clause in the query.

query2 = client.query(kind="Toy", ancestor=pet.key)
for toy in list(query2.fetch()):
print(toy.key)
print(toy["Toy_id"])

With more complicated queries, Datastore requires specific indexes to be set in place. For example, running the same query with an additional filter will require an index and cause a failure if one is not put in place.

# Adding a filter on the price will cause this to fail!
query2 = client.query(kind="Toy", ancestor=pet.key)
query2.add_filter("price", ">", .5)
for toy in list(query2.fetch()):
print(toy.key)
print(toy["Toy_id"])

To address this issue, create a index in Datastore using an index.yaml file with the following contents:

indexes:
- kind: Toy
ancestor: yes
properties:
- name: price

Upload the yaml file using the gcloud commands: gcloud datastore indexes create path/to/index.yaml Your index should show up in your Datastore console! Wait for the indexing process to finish and re-run the query to see that it now works! So there you have it: a working example of entity groups, ancestors, indexing and pets(!!) within Datastore. Happy Coding!