Projection Tables

Jim Walker
Sep 22, 2015 · 4 min read

in

Postgres

This post will walk through a simple scenario where creating a projection table might be beneficial. It will discuss the challenges and propose relative solutions.

Scenario

For starters, let’s suppose there are multiple satellites that are sending down telemetry every day to a ground system on Earth. The ground system stores each of those states in a log.

The log data is useful for analyzing trends over time. Successfully monitoring these trends can prevent equipment damage. So being able to query effectively is important.

Example Data

Here is an example log containing hypothetical telemetry data.

CREATE TABLE telemetry (
date date,
satellite_id integer,
mnemonic text,
value text
);

The desired pivot table would have the Mnemonic column as the table’s headers while its data is populated from the associated Values to the Satellite_id and Mnemonic. The desired pivot table is shown below.

Initial Thought

An initial thought might be to start with creating a table with the Mnemonics as column headers. Then going through each row and constructing an insert statement.

CREATE TABLE pivoted_telemetry (
Satellite_id integer,
TMP_1 integer,
TMP_2 integer,
BATT_STATE text,
HTR_1_STATUS text,
);

Although this method works, Postgres has a built-in function that can build our table for us.

Postgres’ Crosstab function

Postgres has a convenient function called crosstab which is contained in the TableFunc module.

crosstab(text sql)

CREATE EXTENSION tablefunc;
SELECT
*
FROM
crosstab(
‘SELECT * FROM telemetry WHERE date=’2016–10–24’;’
) AS ct(
‘Satellite_id text,
TMP_1 text,
TMP_2 text,
BATT_STATE text,
HTR_1_STATUS text’
);

This is a good start, but it would be nice to find the Satellite that has a TMP_1 > 550. However, attempting to do so would result in the following error:

No operator matches the given name and argument type(s). You might need to add explicit type casts.

The “No operator” error occurs because you cannot do a comparison on an integer type with a text type. To explicitly change the column types from text to integer, the following query can be executed:

SELECT
*
FROM
crosstab(
‘SELECT * FROM telemetry WHERE date=’2016–10–24’;’
) AS ct(
‘Satellite_id integer,
TMP_1 integer,
TMP_2 integer,
BATT_STATE text,
HTR_1_STATUS text’
);

Unfortunately, setting some of the column types to integers results in another error:

return and sql tuple descriptions are incompatible

However this error can be resolved if we examine alternative TableFunc functions. Accordingly, crosstab(text sql, text sql) documentation states:

The main limitation of the single-parameter form of crosstab is that it treats all values in a group alike, inserting each value into the first available column. If you want the value columns to correspond to specific categories of data, and some groups might not have data for some of the categories, that doesn’t work well. The two-parameter form of crosstab handles this case by providing an explicit list of the categories corresponding to the output columns.

Therefore, the correct crosstab function to use is crosstab(text source_sql, text category_sql). The source_sql argument is a query for all the source data. The category_sql argument is a query for all the column headers. To make sure that the correct order of columns and data are preserved, `ORDER BY 1` is used in each statement.

SELECT
*
FROM
crosstab(
‘SELECT * FROM telemetry WHERE date=’2016–10–24’ ORDER BY 1;’,
‘SELECT DISTINCT mnemonics FROM telemetry ORDER BY 1; ‘
) AS ct(
‘Satellite_id integer,
TMP_1 integer,
TMP_2 integer,
BATT_STATE text,
HTR_1_STATUS text’
);

The query normalizes the data and allows for additional queries or aggregations. As a result, we can now allow for more advanced queries on log data. Below is an example of such a query.

SELECT
(TMP_1 + TMP_2)/2.0 AS TMP_AVG
FROM
crosstab(
‘SELECT * FROM telemetry WHERE date=’2016–10–24’ ORDER BY 1;’,
‘SELECT DISTINCT mnemonics FROM telemetry ORDER BY 1; ‘
) AS ct(
‘Satellite_id integer,
TMP_1 integer,
TMP_2 integer,
BATT_STATE text,
HTR_1_STATUS text’
)
WHERE
HTR_1_STATUS = ‘ON’;

Conclusion

There are many different applications that can take advantage of crosstabs. This post offers a simple example on how we can apply advanced logged data queries through the medium of a real world scenario.

Namely is hiring


Namely Labs

Namely Engineering + Design

Jim Walker

Written by

Engineer

Namely Labs

Namely Engineering + Design

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade