How denormalizing our PostgreSQL DB turned great

Rui Sebastião
Pennylane Tech & Product
2 min readAug 4, 2022

--

When designing data models, it’s often considered best practice to normalize the database structure to avoid data redundancy and incoherences.

At Pennylane, we’ve found that this actually does not help us and even introduces additional complexity in managing data access on our accounting platform.

This is because 95% of the data on our main PostgreSQL database is tied to a company — users, invoices, transactions, accounts, ledger events, plan items, etc. — and as the dependency tree grows, so does the amount of data we need to fetch. Furthermore, it also raises additional security concerns, as we don’t want one company accessing another company’s data.

In order to simplify our lives, we decided on the trade-off of adding the company_id to all the models.

The main advantages are:

  • It’s easier to write, read, and manage our Ability file (we use CanCanCan in our rails app), which means it’s easier to keep track of who can access what data;
  • We saved on JOINs: as a practical example, it takes one JOIN to know that an invoice line is associated with a company;
  • “Belt and suspenders” approach to Insecure Direct Object Reference (IDOR):
    — We ensure that we’re only loading the data from the corresponding company;
    — We achieved this by patching ActiveRecord’s submodules — AssociationScope, Preloader…
    — … and by using the where_exists gem — also patched in-house;
  • We ensure default propagation of company_id when building new objects.

Whilst the main disadvantages are:

  • Duplicated FK everywhere (data redundancy/data consistency);
  • The query planner may not take the obvious route sometimes. 🤷‍♂

What else did we add to simplify our lives?

  • Row level security on PostgreSQL;
  • A changelog implemented on top of AWS Athena so we can audit our data changes;
  • Permanent attributes: it doesn’t make sense for an invoice line to move from one invoice to another.

Our combination of denormalizing some of our data, enhancing some of ActiveRecord’s methods, and introducing default behaviors in our application models have enabled us to greatly simplify access control management, keep performance at good levels, and have given us confidence that our production data is in a consistent state.

👀 Watch the full presentation by Quentin de Metz in Madrid.rb: https://vimeo.com/621835714

--

--