‘References Many’ vs. ‘Embeds Many’ in MongoDB

Recently, I began working on a side project that involves tree-like structure. The data structure looks something like this:

{ Tree 1 :
  Content:
    Child_Trees : [{
      Tree 2 : { … }
    }]
}

Here are some challenges:

1) Given a single tree, how can you quickly retrieve all of the child trees and their content?

2) Given a single tree, how can you quickly retrieve all of its parents and their content?

An intuitive way to represent this data is as a Ruby hash. Not surprisingly then, it turns out that a very efficient way to store the data is using a NoSQL database such as MongoDB. MongoDB stores data as BSON documents (similar to JSON), instead of tables and rows, BSON (or JSON) is a more natural form of representing certain forms of data, often more human readable and more readily presentable to the client (browser). Here’s a great explanation from MongoDB: http://blog.mongodb.org/post/72874267152/transitioning-from-relational-databases-to-mongodb

For Ruby, there exists a wrapper called Mongoid that allows you interact with MongoDB in a way similar to Active Record. The mongoid documentation is fantastic, and I recommend you check it out.

The philosophy of Mongoid is to provide a familiar API to Ruby developers who have been using Active Record or Data Mapper

Let’s return to our tree structure:

{ Tree 1 :
  Content:
    Child_Trees : [{
      Tree 2 : { … }
    }]
}

The first thing you’ll notice is that the relationship between Tree and child_trees is a has_many relationship, to borrow ActiveRecord terminology.

Two types of Has Many relationships in MongoDB

References Many (1-N)

In the references many association, each object is stored as a separate JSON document, and the objects contain pointers to each other. This is not drastically different form our concept of a relational has_many association, where one table would contain foreign key ID’s that map to another table.

Embeds Many (1-N)

The embeds many association is more exotic. In embeds many, a single Mongoid document encapsulates other objects within itself.

Though each object in an embeds many relationship is a separate object, the child objects all live within a single parent BSON document, with each child represented as a hash.

Looking again at our Tree object:

{ Tree 1 :
  Content:
    Child_Trees : [{
      Tree 2 : { … }
    }]
}

You may notice that this this data structure is repeating, or recursive, that child_tree is not a distinct class but rather Trees themselves. How can you tell Mongoid that what you want is a self-referential has_many relationship?

You may be tempted to write something like this, borrowing from Active Record syntax:

has_many child_trees, class_name: Tree, inverse_of: parent_tree
belongs_to parent_tree, class_name :Tree, inverse_of: child_trees

There exists a gem, mongoid-tree that actually abstracts away much of the challenges of developing a self-referential association and the necessary methods.

The standard references many implementation used by mongoid-tree is very much like a singly linked list. Each tree is a separate document, and it simply contains a pointer to its parent tree. There are no pointers to child trees, just pointers from a child tree to its parent tree.

Based on this design, its very easy to traverse from a given tree all the way up to its “root” ancestral tree (just call the “.root” method on a mongoid-tree document).

However, retrieving all of the children of a given tree is an entirely different story. Luckily mongoid-tree provides a convenient method to accomplish this for us!

descendants_and_self

However, this method just returns an array of Tree objects. It doesn’t give any information back about the relationships between these descendants. What we really want to see is a fully mapped out tree, where we know the exact parent-child relationship between every tree returned in descendants_and self. This is necessary, for example, if we wanted to present a visualization of the entire tree structure.

One way to accomplish this would be to retrieve all of the children of a tree is to first check to find all of the trees with the parent ID equal to the tree, and then perform this recursively, until all of the descendants are mapped to their original parent tree. However, thats a slow operation, that would result in N database queries for a tree of depth N.

…Our final solution

Recursively Embeds Many

Mongoid offers a helpful alias “recursively_embeds_many” for developing a relationship where you have a single class that embeds many instances of itself recursively in the same document. This looks like a lot like our tree sketch from earlier!

recursively_embeds_many
#equivalent to the following two lines of code
#embeds_many :child_trees, :class_name => “Tree”, :cyclic => true
#embedded_in :parent_tree, :class_name => “Tree”, :cyclic => true

One downside of using the embeds_many is that, since your objects are stored within a single BSON document, you cannot use the inbuilt “.find” mongoid method to find a document by ID. “.find” will only return the root objects, and not any of the children, since only the root is a BSON document.

Therefore, when using embeds_many, you must build (or borrow) your own search function that will scan a document and “find_by_id” (or some other attribute).

The advantage, however, over the references many approach (as used by mongoid-tree) is that when you retrieve a Tree, you retrieve it along with all of its children in ordered form, using a single query. This makes representing the full data tree very natural. You can convert the BSON object to JSON or a hash and easily represent it.

Since the bottleneck in our application was going to be retrieving all of the children of a tree, we made the tradeoff that it was OK to build our own “find_by_id” method, as long as it was quick to retrieve the tree with all of its children. However, for your own application needs, you might find using mongoid-tree (or another references many implementation) to be more apt.

I hope this has been helpful as you take your first sips of NoSQL!

-Sush

Like this:

Like Loading…

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.