On normalization

Published in

live stanklimoff com

1 min readMar 26, 2015

Fourth normal form is considered to be the pinnacle of database normalization. It is not.

Modeling relational data is straightforward. Take the raw data, extract relationships. When you’ve made all of them explicit, you’re good. The four types of relationships that you’re supposed to extract give you four steps to the most normal form there is.

However, there’s the fifth type of relationship which is slowly becoming a part of the technology discourse, and that’s the relationship with time.

Schemas change. Relations change. Modern applications require us to upgrade and roll back code all the time, without losing a single transaction. Each record in the database is a fact at a given point of time. When a fact is added, removed, or altered, we want to have a full event log that can be rolled forward or backward at will. We want to trace effects to causes and query facts within a certain time window.

Normally, time is either swiped under the rug (transaction logs and MVCC in the databases) or made explicit at the expense of the model (stream processing and CEP). Attempts to bridge the gaps between the two are given fancy names to hide the ugliness.

What we want instead is a way to model our data that acknowledges and reflects change over time. This would be, in a sense, the fifth normal form.

On normalization

Written by stan klimoff