Parent and Child joins with ElasticSearch 7

Sohan Ganapathy
Jun 20, 2019 · 6 min read
Image for post

In a relational database a child table references the parent with a foreign key and this relationship is called a Join. The design typically involves normalizing the data.

ElasticSearch is not a relational database, it is all about search efficiency and not storage efficiency. The data stored is denormalized and is pretty much flat. What that means is joins cannot be across Indexes, ElasticSearch is all about speed and traditional joins would run too slow. So both the child and parent documents must be on the same Index and in the same Shard.

Example Parent — Child relationship

Let’s consider the below family tree depicted in Image 1. The tree has 3 Parents and 9 Children. Each character has a “gender” and an “isAlive” status.

Image for post
Image 1: Sample family tree with Parent and Child relationships.

With the above example we explore the below scenarios:

  • Parent Child relationships
  • Having Multiple Children per Parent
  • Multiple Levels of Parent Child relationships

Creating the “Family_Tree” Index

The code below helps create an index for the above relationship. (Setup guide for Elastic Search). Starting ElasticSearch 7, a “type” is no longer required for indexes unlike previous versions.

createIndex.sh — Create the family_tree Index

Line 23: The relation_type, is a name for the join..

Line 24: The type join is a special field that creates parent/child relation within documents of the same index.

Line 25: Parent-child uses the Global Ordinals to speed up joins.

Line 26–28: The relations section defines a set of possible relations within the documents, each relation being a parent name and a child name.

Inserting the Parent data

Let’s walk through the code for one parent insert before running a script to insert the other parents depicted on Image 1.

Create Darren Ford

The above code creates a new document for Darren Ford and marks it as a parent document using, the relation_type field. A value “parent” is assigned to the name of the relation. Along with the relations it also adds fields needed like “firstName”, “lastName”, “gender” and “isAlive”.

One key thing to notice here is the routing query parameter. Each parent assigns its own name to the parameter. The routing field helps us control which shard the document is going to be indexed on. The shard is identified using the below equation:

shard = hash(routing_value) % number_of_primary_shards

We can insert the remaining parents using the script here .

Inserting the Children data

Similarly let’s walk through one child insert before running a bulk insert of the 9 Children depicted on Image 1.

Create Pearl Ford

In our example ‘Pearl Ford’ is a child of ‘Darren Ford’, notice that we use the same routing query parameter that we used to create a record for Darren. This is because of the restriction where both the child and parent documents must be on the same shard.

The join between this record and Darren’s is made by the relation_type field, where we add the name of the relation as a “child” making Pearl Ford a child of the parent whose Id is “1” (The same Id we created the parent Darren with).

We can insert the remaining children using the script here .

Querying our data

Now the fun part of executing and understanding, the queries we can run on the relationship we just created.

Searching and Filtering specific parents

  • Get all children of Sienna Evans: The parent_id query can be used to find child documents which belong to a particular parent.
Get all children of Sienna Evans

Executing the above query gets the “Ralph Evans”, document.

{
"took" : 2,
...
"hits" : [
{
"_index" : "family_tree",
"_type" : "_doc",
"_id" : "9",
"_routing" : "Sienna",
"_source" : {
"name" : "Ralph",
"house" : "Evans",
"gender" : "Male",
"isAlive" : true,
"relation_type" : {
"name" : "child",
"parent" : "2"
}
}
}
]
...
}
  • Get All children of Darren Ford who are alive: The bool and must, query keywords can be used to fetch the records.
Get All children of Darren Ford who are alive

Executing the above query will get the records for “Pearl”, “Ava”, “Tyler” and “Xavier” Ford.

Has Child and Has Parent queries

The query keywords has_child and has_parent, help query data with parent child relationships.

  • Get All parents who have daughters who are dead : The has_child, keyword helps us fetch all the parent records, where the children have filters.
Get All parents who have daughters who are dead

Executing the above query, gets the record of “Ryan Turner”, who is the only parent with a dead daughter “Scarlet Turner”.

  • Get All Children who’s Parent has gender as “Female”: The has_parent, keyword helps us fetch all the child records, where the parents have filters.
Get All Children who’s Parent has gender as “Female”

Executing the above query, gets the record of “Ralph Evans” who’s parent is “Sienna Evans”, all other parents being Male.

Having Multiple Children per Parent

Let us add “Melissa Ford” as a wife to “Darren Ford”, which is depicted in the below Image 2. “Darren” now has Children and Wife documents attached.

Image for post
Image 2: Sample family tree with Parent, Wife and Child relationships.

The Index can be changed using the code below:

Modify index adding a new child to Parent — Wife.

Line 9: We now have an array of relationships associated with the Parent which are “child” and “wife”.

Inserting a “Melissa Ford” document, is similar to the child record we created earlier, this will use the same routing parameter we used on the parent routing=Darren and use “wife” as the relation_type name.

Creating Melissa Ford Record

Query the wife data:

  • Get the Parents who have a wife : The query uses the has_child keyword and filters by the type of “wife”
Get the Parents who have a wife

Executing the above query, gets the record of “Darren Ford”.

Multiple Levels of relationship (Grand Children)

Let us add GrandChildren to the family tree as depicted in the below Image 3.

Image for post
Image 3: Sample family tree with Parent, Wife, Child and GrandChild relationships.

The Index needs to be recreated here! this is because of another restriction, where it’s is possible to add a child to an existing element only if the element is already a parent. Since “child” type was not a parent when we created the index earlier, we need to drop the earlier index, create a new one with the below code and re-insert all the data.

Recreate Index

Line 16: The “child”, is also made a parent here of the type “grandchild”. This lets us have the relationship PARENT → CHILD → GRANDCHILD.

Inserting Grand Children documents, is very similar to inserting child records.

Insert Grand Child

In our example “Douglas Ford” is a child of “Pearl Ford” and a grandchild of “Darren Ford”, notice that we use the same routing query parameter that we used to create a record for Darren. This is ensure all the children associated with the super parent “Darren” are indexed on the same shard.

The join between this record and ‘Pearl Ford’ is made by the relation_type field, where we add the name of the relation as a “grandchild” making “Douglas Ford” a grandchild of the parent whose Id is “5” (The same Id we created Pearl Ford with).

We can insert the remaining grand children using the bulk script here .

Querying GrandParent Data

  • Get All Grandparents who have grand-daughters:

Executing this query, gets us the “Ryan Turner” record, since he is the only grandparent with a granddaughter “Eleanor Turner”, as depicted in Image 3.

Using multiple levels of relations to replicate a relational model is not recommended. Each level of relation adds an overhead at query time in terms of memory and computation. You should de-normalize your data if you care about performance. — elastic.co

Restrictions of joins in ElasticSearch

Now that we have seen the join feature in action, let’s go over the restrictions noticed above.

  • Parent and child documents must be indexed on the same shard.
  • Only one join field mapping is allowed per index.
  • An element can have multiple children but only one parent.
  • It is possible to add a new relation to an existing join field.
  • It is also possible to add a child to an existing element but only if the element is already a parent.

Conclusion

Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. One must be aware of the tradeoffs like the physical storage constraint of parent and child document and added complexity. Another precaution is to avoid multi-layered parent-child relationship since this will consume more memory and computation.

The Startup

Medium's largest active publication, followed by +720K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store