The Flaws in Polyglot Persistence

When your application adopts Polyglot Persistence it doesn't have to be bound to a single data store. It can store tree-like structures with many nested entities easily in document stores and push data with many writes in highly scalable systems such as Cassandra. This usually brings better developer experience as well as better performance. But there are always two sides of the coin. This article explains two major flaws with which led me to create Data Reconstruction Utility (Dru) tool.

One of the problems every project is facing is preparing test data. It is a challenge to prepare test data for a simple project but when the data spans multiple data stores it gets even more complicated. The two major issues I found can be described as The Identity Flaw and The Self-Containment Flaw.

The Identity Flaw

Loosely related data stored in relational database and document store

Let's take a look at a simple example. An application stores orders in relational database and products in the document store. Order items are loosely related to products just by their id. The product id is generated automatically by the document store so we can't simply create a SQL dump. To prepare the complete test data for a single order having one order item pointing to one product we conceptually need to do following:

  1. save a product into the document store
  2. save an order into the relational database
  3. save an order item with new product id and new order id into the relation database

One usually doesn't perform these steps at the low level but uses some framework instead but the steps would look pretty similar. Actually, I would say that by choosing to use Polyglot Persistence you force yourself into preparing the test data using your application logic. Here is a pseudocode showing test order creation:

public Order populateAndSaveOrder() {
Order order = new Order();
return order;

public OrderItem buildOrderItem() {
OrderItem item = new OrderItem();
return item;

public Product populateAndSaveProduct() {
Product product = new Product();
product.setName("Java 9 Modules in a Year of Lunches");;
return product;

Another horrible implication is that you have to keep your test data inside source files. The test data should be stored in a format which is easily editable and updatable.

The Self-Containment Flaw

Another problem which occurs even without using Polyglot Persistence is how to create the minimal dataset which will contain only relevant entities needed by a particular test.

There is an ultimate source of test data which is called production database. I believe there are some tools which allow you to extract just a related rows from a relational database. Anyway you can always do a full database dump and delete rows you don't need and all the relations will remain intact. But I am not aware of any tool to extract related data from multiple data stores. Except you own application.

For example, if your application is Single Page Application you are probably already generating JSON response similar to following one:

"id": 12345,
"lines": [
"id": 67890,
"product": {
"id": "xyz-abc-rur",
"name": "Java 9 Modules in a Year of Lunches"

Wouldn't it be great if you can use such a snippet to prepare your test data?

The Data Reconstruction Utility

Simply said, Data Reconstruction Utility (Dru) is smart unmarshaller which respond the two problems mentioned above. It is able to read the response from the running application and save the data into the particular data stores while respecting the object identities. The Order and the Order Item will probably have id 1and the Product will have some generated unique identifier string but the relations will be still kept.

Only thing Dru requires is to map the content of the JSON file

Dru dru = Dru.plan {
from ('order.json') {
map {
to (Order) {
map ('lines') {
to (OrderLine) {
map ('product') {
to (productId: Product)

Once you load the test data file such as the JSON mentioned you can access the entities by their type or original identifiers.

String productId = dru.findByType(Order).lines[0].productId
assert productId == dru.findByType(Product).id
assert productId == dru.findByTypeAndOriginalId('xyz-abc-rur').id

Dru currently reflects our technology stack so it is written in Java and Groovy and it initially supports GORM and Amazon DynamoDB. Test data can be specified as JSON or YAML files. Both new clients and parsers can be also developed easily.

Read the full documentation to get more information about Dru.

Please, clap your 👏 if you find this article useful and help others find it.




Agorapulse is a leading Social Media Management platform. This is our story and feedback from the ground.

Recommended from Medium

Beginner’s Guide to Task Automation: Automate Repetitive Tasks and Scale Your Business

The App Development Learning Journal: Day 14

Using GitHub Pages to Host Documetnation

Six Steps To Migrate To Cloud

Overview of Web Scraping with Python’s BeautifulSoup and requests library

Web Development Reading List #167: On Team Retreats Immutable Cache And Eliminating Clearfix Hacks

XING Kubernetes components published as Open Source

Best Android App Development Tools

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vladimír Oraný

Vladimír Oraný

Full Stack Developer and Test Facilitator at @agorapulse

More from Medium

Is The Thread-per-Request Model a Good Thing After Project Loom?

Convenient Way to Mutate Immutable Objects

Integration tests with Kotlin, LocalStack and Docker-Compose

Enable Smile serialization on requests using Spring WebClient (Kotlin)