The Flaws in Polyglot Persistence

When your application adopts Polyglot Persistence it doesn't have to be bound to a single data store. It can store tree-like structures with many nested entities easily in document stores and push data with many writes in highly scalable systems such as Cassandra. This usually brings better developer experience as well as better performance. But there are always two sides of the coin. This article explains two major flaws with which led me to create Data Reconstruction Utility (Dru) tool.

One of the problems every project is facing is preparing test data. It is a challenge to prepare test data for a simple project but when the data spans multiple data stores it gets even more complicated. The two major issues I found can be described as The Identity Flaw and The Self-Containment Flaw.

The Identity Flaw

Loosely related data stored in relational database and document store

Let's take a look at a simple example. An application stores orders in relational database and products in the document store. Order items are loosely related to products just by their id. The product id is generated automatically by the document store so we can't simply create a SQL dump. To prepare the complete test data for a single order having one order item pointing to one product we conceptually need to do following:

  1. save a product into the document store
  2. save an order into the relational database
  3. save an order item with new product id and new order id into the relation database

One usually doesn't perform these steps at the low level but uses some framework instead but the steps would look pretty similar. Actually, I would say that by choosing to use Polyglot Persistence you force yourself into preparing the test data using your application logic. Here is a pseudocode showing test order creation:

public Order populateAndSaveOrder() {
Order order = new Order();
order.addItem(buildOrderItem());
orderService.save(order);
return order;
}

public OrderItem buildOrderItem() {
OrderItem item = new OrderItem();
item.setProductId(populateAndSaveProduct().getId());
return item;
}

public Product populateAndSaveProduct() {
Product product = new Product();
product.setName("Java 9 Modules in a Year of Lunches");
productService.save(product);
return product;
}

Another horrible implication is that you have to keep your test data inside source files. The test data should be stored in a format which is easily editable and updatable.

The Self-Containment Flaw

Another problem which occurs even without using Polyglot Persistence is how to create the minimal dataset which will contain only relevant entities needed by a particular test.

There is an ultimate source of test data which is called production database. I believe there are some tools which allow you to extract just a related rows from a relational database. Anyway you can always do a full database dump and delete rows you don't need and all the relations will remain intact. But I am not aware of any tool to extract related data from multiple data stores. Except you own application.

For example, if your application is Single Page Application you are probably already generating JSON response similar to following one:

{
"id": 12345,
"lines": [
{
"id": 67890,
"product": {
"id": "xyz-abc-rur",
"name": "Java 9 Modules in a Year of Lunches"
}
}
]
}

Wouldn't it be great if you can use such a snippet to prepare your test data?

The Data Reconstruction Utility

Simply said, Data Reconstruction Utility (Dru) is smart unmarshaller which respond the two problems mentioned above. It is able to read the response from the running application and save the data into the particular data stores while respecting the object identities. The Order and the Order Item will probably have id 1and the Product will have some generated unique identifier string but the relations will be still kept.

Only thing Dru requires is to map the content of the JSON file

Dru dru = Dru.plan {
from ('order.json') {
map {
to (Order) {
map ('lines') {
to (OrderLine) {
map ('product') {
to (productId: Product)
}
}
}
}
}
}
}

Once you load the test data file such as the JSON mentioned you can access the entities by their type or original identifiers.

String productId = dru.findByType(Order).lines[0].productId
assert productId == dru.findByType(Product).id
assert productId == dru.findByTypeAndOriginalId('xyz-abc-rur').id

Dru currently reflects our technology stack so it is written in Java and Groovy and it initially supports GORM and Amazon DynamoDB. Test data can be specified as JSON or YAML files. Both new clients and parsers can be also developed easily.

Read the full documentation to get more information about Dru.

Please, clap your 👏 if you find this article useful and help others find it.