SaaS Product Engineering

Random SaaS product engineering notes, system design, implementation ideas

Postgres Materialized View-based analytical database setup architecture design notes

--

In this article, I will explain the simplest way to set up a Postgres database solution for analytical purposes using a read replica, materialized view, and a third database to reduce the load on the master database to run analytical queries. This approach can be useful for SaaS applications where maintaining data for several tenants using the ETL process is quite difficult.

There are several ways to set up an analytical database for example extracting data from the main database say Postgres or SQL and transferring data to BigQuery, Azure Synapse Analytics, etc. But to implement that there is a need for an Extract Transform and Load (ETL) process. Even though there are several automated tools available still it is challenging to maintain data consistency between primary and secondary databases. ETL process creates more opportunities for intermediate points of failure.

Solution:

This solution is using Google Cloud Platform (GCP), however, a similar concept can be applied in AWS or Azure.

Step 1: Create a primary Postgres Database (Database 1) in GCP — This is the main database

Step 2: Create a read replica (Database 2) — GCP has an option to create a read replica with just a click of a button. This replica will replicate the entire server all Database, Tables, Views, Stored Procedures, and DDL Query changes to read the replica.

Note: You can use the pglogical Postgres extension to create a replica manually. But trust me that is not going to work properly several times you are going to face problems and it is hard to trace and restart replication based on my experience. There is a lot of maintenance work to keep consistent up-to-date data in Replica.

For some of you first two steps would be enough. Now instead of giving load on the primary database for analytical queries, you can now use Read replica for querying results to build analytical dashboards.

Using materialized views to speed up analytical queries

Postgres supports Materialized views. The main difference between materialized views and normal views is materialized views store results in a disk you may think of it as a table whereas normal views don't store data on a disk.

When you use a materialized view you have to maintain data up to date at regular intervals. Say for example you can refresh data once every 12 hours. Using a pgcron extension you can write a materialized view refresh query to run every 12 hours.

How materialized views speeds up Analytics?

You can use materialized views containing a query that joins and combines several tables' data into a single view (table) and store it on disk. In this way for analytical queries now instead of querying several tables, you have to just query a single table. As joins are reduced query executes much faster.

If you are using a read replica of GCP. There is one disadvantage now you cannot add/modify any additional schema on the reading replica server.

But we need to have materialized views in reading replicas for faster querying. There is one solution that materialized view can be kept in the primary database server. But sometimes depending on the size of the data refreshing materialized views might affect the performance of the server.

Also for some scenarios, you may need to add extra tables just for analytics purposes in reading replicas.

For solving this problem there is one solution

Step 3:
Create a 3rd Postgres database (Database 3).
Here you can have materialized views and also have the flexibility to add extra tables or schemas.

Now how do you fetch data from a reading replica and store it in 3rd database without the hassle of using ETL or any external tools?

Postgres has a DB Link extension. Using this extension you can connect to a remote database and fetch results. Here we can use the DB link extension and build materialized view in 3rd database by fetching data from Database 2 (Read Replica). This way you can set up easily analytical database without using any ETL process and using a maintenance-free approach.

Conclusion

In this article, I have explained how a maintenance-free analytical database can be set up using Postgres Database read replicas, Materialized Views, and DB Link extension.

--

--

SaaS Product Engineering
SaaS Product Engineering

Published in SaaS Product Engineering

Random SaaS product engineering notes, system design, implementation ideas

Bharat Dwarkani
Bharat Dwarkani

Written by Bharat Dwarkani

Tech Enthusiast, Full Stack Software Engineer, Product Manager, Engineering Enterprise SaaS product

No responses yet