PostgreSQL — Lock Down Historical Timestamps, such as created_at and last_updated_at, at the Database Layer

Lockdown historical timestamps to improve the quality and reliability of data.

Photo by Scott Graham.

The Opinionated TLDR

  • Set historical timestamps at the database layer. DO NOT let the client set these values. Exceptions are for tools that ingest data from third parties which already have historical timestamp columns. Exceptions are for applications that require extreme update performance (generally a rare requirement).
  • There is overhead adding a trigger to a table. The examples below require a Trigger to set the last_updated_at time. Set last_updated_at via the client when update performance is a must.
  • Use CLOCK_TIMESTAMP() as the default for timestamp auditing columns. See When To Use clock_timestamp(), current_timestamp and Unique Timestamps.
  • Use timestamptz over the timestamp type. See “Do not use timestamp (without timezone)
  • Limit access to historical columns at the database layer. For example, do not allow an insert or update to the last_updated_at and created_at columns.
  • Do not trust frameworks that promise a silver-bullet solution: especially around historical timestamp columns. Carefully architect at all layers and apply the tools that best solve the problem at that layer.

NOTE:

  • These opinions are within the context of using PostgreSQL as the backend database. These opinions may not work with other persistence solutions.
  • All examples in this article use Idempotent PostgreSQL SQL DDL.
  • This article does not address the use cases that require historical timestamps.

Introduction

Data quality in a database is paramount. A large part of database architecting involves ensuring data correctness. Architecting includes applying normalization, constraints, data governance, single sources of truth, deduplication, and access constraints such as row-level and column/table-level security.

This article covers managing historical timestamps within the scope of data correctness, reliability, truth, and audibility.

Where Should We Set Historical Timestamps

Regarding historical timestamps, we set the values of a historical timestamp at the database layer. Reasons are:

  • Minimize mistakes — Removing the burden of setting a historical timestamp from the middle tier limits the chance of a developer at the middle tier making a mistake.
  • Timestamp consistency — We end up with more consistent timestamps across records. Different compute instances may differ by milliseconds or microseconds. The current time between languages and frameworks may vary between implementations. When using a single master database, any difference between the actual world time and database time will be consistent.
  • Cross-cutting and Separation of concerns — Updating a historical timestamp is a cross-cutting concern because an explicit update to historical timestamps is required for every mutation. Further, when adding data to a database, a client should only be responsible for one concern: adding information to a database. The client should not be responsible for secondary concerns such as setting historical timestamps.
UPDATE table
SET
value1 = some_value,
...
last_updated_at = CLOCK_TIMESTAMP();
WHERE [condition];
  • Improved auditing — We can show an auditor that we have locked down the inserting and updating created_at and last_updated_at values at the database layer. We can prove to the auditor that these values are valid. The edge case is the database Superuser which could bypass these security measures (fixable via the CI/CD process being the only tool having Superuser access).
  • Easier testing and proofing — SQL DDL is written to test and verify history timestamp mutation. A CI/CD process runs these tests in production on database schema changes. See sql-watch for an example of how to update a database and run tests continuously.
  • When speed is of utmost importance — In cases where speed and scaling are of utmost importance, we want to remove any overhead database constraints caused at the database layer. At this point, the client may be the best place to set historical timestamp values (such as last_updated_at). However, if we are trying to optimize the database to this extent, we may want to consider moving such data to a different solution (leaving that data that does not need to scale in PostgreSQL).
  • When we need to set History Timestamps — In some business use cases, we may need to set the history timestamps from a client. For example, when ingesting data from a third party containing history timestamps. We can create an ingester role to update these columns in these cases (and limit the ingester to write only).

Note: Setting history timestamps may be considered business logic, and there are thoughts on never placing business logic in the database. We believe setting history timestamps is an exception to this rule.

Setting Historical Timestamps

The following SQL DDL example sets the created_at and last_updated_at values at the database level.

SET ROLE postgres;CREATE SCHEMA IF NOT EXISTS perf;DO $$ BEGIN
CREATE ROLE mutator;
EXCEPTION
WHEN duplicate_object THEN null;
END $$;
GRANT USAGE ON SCHEMA perf TO mutator;ALTER DEFAULT PRIVILEGES IN SCHEMA perf
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLES TO mutator;
CREATE OR REPLACE FUNCTION perf.trigger_update_last_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.last_updated_at = CLOCK_TIMESTAMP();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE IF NOT EXISTS perf.default_ts_trig (
default_ts_trig_id bigint NOT NULL,
created_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP(),
last_updated_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP()
);
DO $$ BEGIN
CREATE TRIGGER perf_default_ts__trig_001_before_update
BEFORE UPDATE ON perf.default_ts_trig
FOR EACH ROW EXECUTE
PROCEDURE perf.trigger_update_last_updated_at();
EXCEPTION
WHEN duplicate_object THEN null;
END $$;
REVOKE INSERT, UPDATE ON TABLE perf.default_ts_trig FROM mutator;
GRANT UPDATE (default_ts_trig_id)
ON perf.default_ts_trig TO mutator;
GRANT INSERT (default_ts_trig_id)
ON perf.default_ts_trig TO mutator;

SET ROLE mutator;
INSERT INTO perf.default_ts_trig(default_ts_trig_id)
VALUES (1);
DO $$ BEGIN
PERFORM PG_SLEEP(1.0);
END $$;
UPDATE perf.default_ts_trig
SET default_ts_trig_id = 1
WHERE default_ts_trig_id = 1;
-- NOTE: last_updated_at can be set by the trigger even though
-- mutator does not have update rights
SELECT *
FROM perf.default_ts_trig;
-- ERROR: permission denied for table default_ts_trig
UPDATE perf.default_ts_trig
SET created_at = CLOCK_TIMESTAMP()
WHERE default_ts_trig_id = 1;
-- ERROR: permission denied for table default_ts_trig
INSERT INTO perf.default_ts_trig(default_ts_trig_id, created_at)
VALUES (1, CLOCK_TIMESTAMP());

Both created_at and last_updated_at get set using defaults. In PostgreSQL, setting update_at is done using a trigger.

We disable the mutator role from directly inserting or updating created_at and last_updated_at. However, the last_updated_at column is indirectly updated via the trigger when the mutator updates the default_ts_trig_id column.

We disable the mutator role from directly inserting into created_at and last_updated_at. However, the columns are inserted indirectly by using the default values.

Default Timestamp Overhead Cost

What is the cost of using a default timestamp over having the client set the timestamp? This test is a bit difficult because we would need a client to generate the timestamp for one of the tests.

Set up a utility perf.execute_time:

SET ROLE postgres;CREATE SCHEMA IF NOT EXISTS perf;CREATE TABLE IF NOT EXISTS perf.outcome (
outcome_id uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v1mc(),
outcome_name varchar(128) NOT NULL,
outcome_description varchar(1024) NOT NULL,
sql_ran text NOT NULL,
row_count bigint NOT NULL,
started_at timestamptz NOT NULL,
stopped_at timestamptz NOT NULL,
run_time_in_seconds decimal(18,10) GENERATED ALWAYS AS (EXTRACT(epoch FROM stopped_at - started_at)) STORED,
created_on timestamptz NOT NULL DEFAULT NOW()
);
COMMENT ON TABLE perf.outcome IS 'The outcome of a performance test.';
CREATE OR REPLACE FUNCTION perf.execute_time (
p_outcome_name VARCHAR(128),
p_outcome_description VARCHAR(1024),
p_row_count bigint,
p_sql text
) RETURNS VOID LANGUAGE plpgsql AS
$$
DECLARE
d_started_at timestamptz = clock_timestamp();
BEGIN
EXECUTE(p_sql);
INSERT INTO perf.outcome
( outcome_name , outcome_description , sql_ran, row_count , started_at , stopped_at ) VALUES
( p_outcome_name, p_outcome_description, p_sql , p_row_count, d_started_at, clock_timestamp() );
END;
$$;

Performance test Setup:

CREATE TABLE IF NOT EXISTS perf.no_default_ts (
no_default_ts_id bigint NOT NULL,
created_at timestamptz NOT NULL,
last_updated_at timestamptz NOT NULL
);
CREATE TABLE IF NOT EXISTS perf.default_ts (
default_ts_id bigint NOT NULL,
created_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP(),
last_updated_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP()
);
CREATE OR REPLACE FUNCTION perf.trigger_update_last_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.last_updated_at = CLOCK_TIMESTAMP();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE IF NOT EXISTS perf.default_ts_trig (
default_ts_trig_id bigint NOT NULL,
created_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP(),
last_updated_at timestamptz NOT NULL DEFAULT CLOCK_TIMESTAMP()
);
DO $$ BEGIN
CREATE TRIGGER perf_default_ts__trig_001_before_update
BEFORE UPDATE ON perf.default_ts_trig
FOR EACH ROW EXECUTE
PROCEDURE perf.trigger_update_last_updated_at();
EXCEPTION
WHEN duplicate_object THEN null;
END $$;

INSERT Overhead Costs

Run the performance test for INSERT:

TRUNCATE TABLE perf.default_ts;
TRUNCATE TABLE perf.no_default_ts;
TRUNCATE TABLE perf.outcome;
DO
$$
DECLARE
d_rows bigint = 10000000;
BEGIN
PERFORM perf.execute_time(
'default time', 'default timestamp', d_rows,
FORMAT('
INSERT INTO perf.default_ts(default_ts_id)
SELECT *
FROM generate_series(%s,%s);
', 1, d_rows)
);
PERFORM perf.execute_time(
'default time', 'no default timestamp', d_rows,
FORMAT('
INSERT INTO perf.no_default_ts(
no_default_ts_id,
created_at,
last_updated_at
)
SELECT id, CLOCK_TIMESTAMP(), CLOCK_TIMESTAMP()
FROM generate_series(%s,%s) AS id;
', 1, d_rows)
);
END;
$$;
SELECT *
FROM perf.outcome;

The time to insert 10 million records for each table (20 million total) is as follows:

default ts | no default ts | no default overhead
5.24 sec | 5.72 sec | 5.72 / 5.24 = 1.09

There seems to be no performance hit when defaulting created_at and last_updated_at timestamp at the database layer. There may be a slight improvement of around 8% +/- 2% inserting a new record (based on running tests from a SQL client).

UPDATE Overhead Costs

Run the performance test for UPDATE:

TRUNCATE TABLE perf.outcome;
TRUNCATE TABLE perf.default_ts;
TRUNCATE TABLE perf.default_ts_trig;
DO
$$
DECLARE
d_rows bigint = 10000000;
BEGIN
PERFORM perf.execute_time(
'update time', 'setup no trigger table', d_rows,
FORMAT('
INSERT INTO perf.default_ts(default_ts_id)
SELECT *
FROM generate_series(%s,%s);
', 1, d_rows)
);
PERFORM perf.execute_time(
'update time', 'setup trigger table', d_rows,
FORMAT('
INSERT INTO perf.default_ts_trig(default_ts_trig_id)
SELECT *
FROM generate_series(%s,%s);
', 1, d_rows)
);
PERFORM perf.execute_time(
'update time', 'updated no trigger table', d_rows,
FORMAT('
UPDATE perf.default_ts
SET default_ts_id = default_ts_id;
', 1, d_rows)
);
PERFORM perf.execute_time(
'update time', 'updated trigger table', d_rows,
FORMAT('
UPDATE perf.default_ts_trig
SET default_ts_trig_id = default_ts_trig_id;
', 1, d_rows)
);

END;
$$;
SELECT *
FROM perf.outcome;

The time to update 10 million records for each table (20 million total) is as follows:

no trigger | has trigger | trigger overhead
9.25 sec | 19.23 sec | 19.23 / 9.25 = 2.08

We see a performance hit of 2x when updating the last_updated_at timestamp at the database layer: caused mainly by adding a trigger to the table.

Conclusion

It is possible to lock down a table at the database layer to limit access to historical timestamp columns. In our examples, a mutator role has full rights to INSERT, UPDATE, SELECT, and DELETE on a table except for the historical timestamp columns created_at and last_updated_at.

Although there is a performance hit on updates due to using a trigger, in our opinion, that performance hit is worth the extra security.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eric Hosick

Eric Hosick

74 Followers

Creator, entrepreneur, software architect, software engineer, lecturer, and technologist.