Four viewpoints on entity size.

Published in

Genetec Tech

5 min readApr 4, 2019

Introduction

Just how large should services be? What does micro mean? How much data should be contained within a single entity? As with most things, there is no reason to believe there is one Right Answer. Applications differ dramatically in their requirements and so must at best reuse common principles. Wholesale solution reuse is rare.

This article sits on the assumption that there is a tension between advantages of big/small entities when designing a system. The objective is to initiate a discussion and eventually get to a more systematic reflection on those topics.

Throughout this post I will use the definition of an entity as the unit of consistency. To be more specific, an entity is a bag of properties with a lifetime as well as strong correctness invariants between its own properties. By opposition, any invariant that spans multiple entities must be maintained by a separate, explicit process.

For instance, imagine we want to maintain an index of entities’ key properties for later use. Let’s examine two designs.

One design would be to have a single entity containing the properties and the index. Then every time we change one of those properties, we can update the index at the same time, keeping the index always consistent. This is by virtue of the definition above. But has some other impacts, for instance in terms of contention: every mutation goes through a single entity.

An alternative design is to keep each entity separate, and the index also a distinct entity. Now that we have shrunk the units of consistency, what are the implications? We must add a separate process that asynchronously updates the index entity when properties of the source entity are updated. If we stop changing stuff, the index will eventually be consistent with the data. Now our entities are smaller, but we have an eventually consistent system.

What would make us choose one design over another?

I want to look at entity sizes from 4 very different viewpoints: building infinitely scalable systems, understanding the consistency properties of systems, overall system complexity and security.

For each viewpoint, I will pick a preferred direction: bigger, or smaller. Providing direction rather than an absolute size is a deliberate choice: I want to make clear that this document is simply a starting point for discussing and reviewing designs.

The viewpoints.

Scalability: smaller is better.

Let’s say our main consideration is building systems that can be infinitely scaled up. That is to say, as long as we can add compute/storage resources, we want to be able to scale a system to support more and more entities, without an upper bound on the total size.

Why is smaller better? Smaller/more granular entities are easier to shard/partition among computers. Thinking about it in terms of packing entities onto finite resources, smaller entities allow higher utilization than bigger ones.

Counter: if the system has a lot of processes to preserve inter-entity consistency, the cost of those can end up dominating the performance of the system.

References: [Life Beyond Distributed Transactions]

Consistency: bigger is better.

Let’s say our main consideration is to build a system that has critical correctness properties/invariants. This simplifies thinking about the system, reduces special case in our implementation, etc. Things are safe by construction.

Why is bigger better? Given our definition of entity, the broader the scope of our entities, the more information we can ensure consistency for.

Look at two extremes: one big entity. This is the same model as e.g. one big SQL database with arbitrary transactions scopes. As a comparison, single-property-entities. In the first case, any correctness property is trivial to implement and falls within the scope. In the second case, any correctness spanning multiple properties is at best eventually consistent and requires extra work.

From a different point of view, the smaller the pieces, the larger the odds that data denormalization becomes necessary. This brings along many well-known consistency challenges.

System complexity: bigger is better.

Here, our main consideration is maintaining our understanding of the system as we create, change and maintain it.

Why is bigger better?

Bigger pieces more easily contain the data that they need, which reduces need for communication. Simpler consistency models make systems easier to reason about. The smaller entities are, the more they need to collaborate/communicate. This is the very definition of complexity.

Of course, the pieces themselves get more intricate, but from a system perspective a 5 service system with 5 interactions is significantly simpler than a 50 service system with dozens or hundreds of interactions.

Security: it depends.

Let’s think about it from the point of view of security domains and controlling the location of sensitive information. Here smaller can be better, sacrificing consistency/availability but allowing more control on location.

In other news: do you spread out your eggs in many baskets to reduce single point of attack, or do you put all your eggs in one basket and watch it real, really carefully.

For instance, let’s say we are modeling cameras. Does the camera entity contain its connection password or certificate? If so, the service that owns the camera entity has everything it needs to work: it can connect. Is this all? Another implication is that as we scale cameras by adding services, those passwords are being spread across a large area.

An alternative design would be to have a separate entity for the connection information. What have we traded off here? Now before the camera service can connect to the camera, it needs to collaborate with the service that owns the camera connection information. We’ve increased system complexity, and reduced reliability. However, we can now run all of the camera connection information, which own sensitive information, in a more tightly regulated context.

Conclusion.

As we decouple our systems into smaller pieces for deployment and delivery flexibility, granularity and boundaries must be a serious consideration during the design process.

This is not to say that the set of view-points here is exhaustive: it isn’t.

My goal will have been achieved if I communicated that there is a broad range of possible designs between the two extremes of full-centralization and full-decentralization, and that trade-offs to each which can be made explicit and help in decision making.

I hope this can help kickstart a discussion to allow us to grow as designers as we move forward! Looking forward to any comments.