An introduction to AWS Cloud Directory
I’ve rewritten an application that used DynamoDB as its primary data store and replaced it with AWS Cloud Directory. I truly enjoyed the experience and wanted to share my journey to bring more attention to this very under-appreciated service.
So what exactly is AWS Cloud Directory? It’s a serverless, strongly typed, hierarchical data store. It scales to millions of objects and relationships with ease while maintaining referential integrity. Something that is hard to achieve with other serverless stores such as DynamoDB. Unlike DynamoDB, you don’t have to worry about capacity planning. You are charged per request and that’s it.
“Cloud Directory is a building block that already powers other AWS services including Amazon Cognito, AWS Organizations, and Amazon QuickSight.”
— Now Available: Amazon Cloud Directory
This statement gave me sufficient confidence that the service is not going anywhere anytime soon. However, finding resources and examples was almost impossible which indicates that it hasn’t been widely adopted. Cloud Directory didn’t even get a breakout session at re:invent 2017. That’s what motivated me to write this short introduction.
Before you can create objects in Cloud Directory, you need to define a schema. The schema can evolve over time, i.e. backwards-compatible changes, such as additional attributes, can be introduced. A set of attributes is called a facet. Each object can be attached to multiple facets (think of a user that can inherit attributes from an employee facet and manager facet). Objects can be of type NODE or LEAF_NODE. The big difference is that LEAF_NODEs can have multiple parent NODEs while NODEs can only have one parent. Also, LEAF_NODEs cannot have any child objects. Cloud Directory provides APIs to select individual objects based on a path query or a unique object identifier. There are several APIs that will look familiar if you have worked before with hierarchical data stores such as LDAP. Additionally, Cloud Directory has special object types that act as indexes and typed links. Typed links are very powerful in that they allow you to connect any type of object across hierarchies with attributes on the connection (think edge attributes). Complex relationships such as “device belongs to user” or “user is administrator of group” can be modeled that way.
Pricing for Cloud Directory is much more straightforward than DynamoDB. Cloud Directory comes at no cost if it’s idle (i.e. no requests) which is an advantage over DynamoDB, especially for smaller projects. You don’t need to provision any capacity, pricing is purely based on the number of requests. One million eventually consistent reads will cost you USD 0.49 and one million writes USD 5.30. The nice thing is that it doesn’t matter whether the requests happen within an hour or spread out over a month, the cost is the same with no capacity planning necessary. Additional you can have as many tables (or facets in Cloud Directory lingo) as you like without having to worry about scaling and paying for them individually.
While Cloud Directory might seem more expensive at first glance, the overall cost of ownership might be less considering that you won’t have to deal with capacity planning and auto-scaling.
Before jumping into Cloud Directory, make sure you have read and understood its service limits. If you are coming from DynamoDB, the things that will bite you first is the 2kb limit on attribute values. Another limit to keep in mind is the number of objects that can be returned per request, which is 30. Keep in mind that many of these limits are soft limits and can be increased with a service request.
DynamoDB recently released backup and restore for tables. The backups are fully managed by AWS. This has the disadvantage that you can’t just download it and move it yourself but overall a very useful feature.
Cloud Directory has no user-configurable backup solution whatsoever. Since AWS has included Cloud Directory in its HIPAA Eligible Services, they definitely have built a backup and recovery strategy for Cloud Directory, it’s simply not exposed to the user. This means that you will have to roll your own if you plan on making point-in-time snapshots or want to be able to migrate directories between accounts and regions. Unlike DynamoDB, Cloud Directory doesn’t offer streams that you could hook into to build a backup solution. Therefore, all you can do is traverse the directory yourself which adds additional cost.
DynamoDB offers a local DynamoDB server that acts almost exactly like the real deal. This is pretty cool and allows developers to work in a sandbox and run automated tests locally. There is no such thing for Cloud Directory. However, since idle directory instances are essentially free, nothing stops you from having a directory per developer or even create directories on-demand for automated testing. Unfortunately, Cloud Directory doesn’t offer a web interface that let’s you browse through your directory. For the time being, you will have to rely on your coding skills or the AWS CLI to make ad-hoc queries or changes to your directory.