Four Compelling Use Cases for PostgreSQL Table Functions

Bryn Llewellyn
Dec 16, 2019 · 9 min read

Welcome to part three of a three-part series of posts on PostgreSQL’s table functions. These functions can be easily leveraged in a distributed SQL database like YugabyteDB, which is PostgreSQL compatible.

In this final post, I’ll discuss four realistic use cases that call for user-defined table functions and I’ll explain how each case is met using complete, self-contained code examples.

If you missed the two previous posts, make sure to check out:

An Introduction to PostgreSQL Table Functions in YugabyteDB

Implementing PostgreSQL User-Defined Table Functions in YugabyteDB

Use Case #1: Parameterized View

In my blog post “ Using Stored Procedures in Distributed SQL Databases “, I described an experiment to measure the speeds of transactions with a variable number of insert statements per transaction when these are implemented respectively as a sequence of top-level database calls or as a single call to a stored procedure that has the same effect. I computed the procedure-to-top-level-call speed ratios, as a function of the number of inserts per transaction. I did this by recording the result for each test in a table and by then running code like I describe in this section.

This table is a stylized version of the table that I used:

create table results(
test_no int,
method varchar(20),
duration dec
constraint results_duration_nn not null,
constraint results_duration_chk check(duration > 0),
constraint results_pk primary key (test_no, method));

Use this to insert some data to illustrate the behavior of the code that I describe in this section:

insert into results(test_no, method, duration)
values
(1, 'Method A', 10.1),
(1, 'Method B', 8.6),
(1, 'Method C', 5.4),
(2, 'Method A', 20.6),
(2, 'Method B', 17.3),
(2, 'Method C', 9.9),
(3, 'Method A', 30.4),
(3, 'Method B', 23.8),
(3, 'Method C', 16.2);

This table function computes the speed ratios for two specified methods for all the test_no values:

create function speed_ratios(
method_1 in results.method%type,
method_2 in results.method%type)
returns table(test_no results.test_no%type, ratio dec)
language sql
as $body$
select
test_no,
method_a_duration/method_b_duration as ratio from (
select
test_no,
t1.duration as method_a_duration,
t2.duration as method_b_duration
from results t1 inner join results t2
using(test_no)
where t1.method = method_1
and t2.method = method_2) as a;
$body$;

Test it like this:

select
test_no as "Test No.",
to_char(ratio, '990.99')||'x' as "Speed Ratio"
from speed_ratios('Method A', 'Method B')
order by 1;

It produces this output:

Test no. | Speed Ratio 
----------+-------------
1 | 1.17x
2 | 1.19x
3 | 1.28x

Of course, you could generate a report like this with a client-side program. But doing so would lose a major benefit. The table function approach lets you run arbitrary queries to answer arbitrary questions, like this:

select to_char(
(select max(ratio) from speed_ratios('Method A', 'Method B'))/
(select max(ratio) from speed_ratios('Method B', 'Method C')),
'990.99')
as x;

Use Case #2: Pretty Printed Ad Hoc Reports
for Administrators

In “Use case #1”, there are three distinct values for results.test_no in the example sample data that I use. That means that there are three distinct comparison tables to be looked at: “Method A” vs “Method B”, “Method B” vs “Method C”, and “Method A” vs “Method C”. Rather than producing these by executing a .sql script with the three explicit “select… from speed_ratios(…)” statements, interspersed with appropriate \echo meta-commands, it would be nicer just to execute a single stored procedure. However, the only way to get output from an ordinary PL/pgSQL procedure is to use raise info (or raise notice). When you run such a program from a script, then every output line looks like this:

ysqlsh:raise_info_demo.sql:9: INFO:  Some output

Moreover, the line numbers will, in general, all be different-so you can’t simply remove this awful clutter with a simple “search and replace all” in an ordinary text editor. This is especially frustrating for developers and administrators who, using other databases, assemble a kit-bag of scripts and programs to generate useful, and usably readable, reports that-just like for the present use case, assemble the output from several separate queries. Here’s where a table function comes to the rescue. This is the basic idea:

create function pretty_report_speed_ratios()
returns table(t varchar)
language plpgsql
as $body$
begin
for j in 1..5 loop
t := '';
return next;
end loop;

Run the report like this:

select t from pretty_report_speed_ratios();

The loop over the method combinations starts like this:

<<"loop over the method combinations">>
for j in 1..3 loop
declare
method_1 results.method%type;
method_2 results.method%type;
begin
case j
when 1 then
method_1 := 'Method A';
method_2 := 'Method B';
when 2 then
method_1 := 'Method B';
method_2 := 'Method C';
when 3 then
method_1 := 'Method A';
method_2 := 'Method C';
end case;

Here’s the final missing piece:

<<"do each method-pair combination">>
declare
tt varchar(20);
rr varchar(20);
n int := 0;
begin
for tt, rr in (
select
to_char(test_no, '9999999'),
to_char(ratio, '99990.99')||'x'
from speed_ratios(method_1, method_2)
order by 1)
loop
n := n + 1;
t := tt||' '||rr;
if n = 1 then
t := t||' << '||quot||method_2||quot||' to '||
quot||method_1||quot;
end if;
return next;
end loop;
end "do each method-pair combination";

And here’s the output produced by pretty_report_speed_ratios():

==================================================
Speed Ratios Report
==================================================

Test No. Speed Ratio
-------- -----------

1 1.17x << 'Method B' to 'Method A'
2 1.19x
3 1.28x

1 1.59x << 'Method C' to 'Method B'
2 1.75x
3 1.47x

1 1.87x << 'Method C' to 'Method A'
2 2.08x
3 1.88x
==================================================

We can see at a glance that “Method B” is about 1.2x faster than “Method A”, that “Method C” is about 1.6x faster than “Method B”, and that, of course, “Method C” is therefore about 1.9x faster than “Method A”.

The full source text of the pretty_report_speed_ratios() table function is presented in the appendix so that you can simply copy and paste it into ysqlsh and then run it.

Use Case #3: Dynamic IN List

Assume that we have table t with this content:

k  |  v   
-----+------
100 | t_01
200 | t_04
300 | t_42
400 | t_04
500 | t_10
600 | t_42
700 | t_07
800 | t_16
900 | t_42

The code to create this is shown in the “Built-in SQL table functions” section under “generate_series()”. Developers and administrators usually type an explicit in list in an ad hoc query like this:

select k, v from t where k in (200, 300, 500, 900) order by 1;

But what if this is to be issued from a PL/pgSQL procedure and the in list members aren’t known until run-time? The naïve programmer tends to think that this calls for dynamic SQL where the in list is constructed programmatically. But there’s a much better, and simpler to write, way:

select k, v from t where k in (
select unnest as k
from unnest(
array[200, 300, 500, 900]::int[]))
order by 1;

The functionality of the unnest() built-in SQL table function can be implemented as a user-defined table function thus:

create function my_unnest(ks int[])
returns table(k int)
language plpgsql
as $body$
declare
element int;
begin
foreach element in array ks loop
k := element;
return next;
end loop;
end;
$body$;

Use it like this:

select k, v from t where k in (
select k
from my_unnest(
array[200, 300, 500, 900]::int[]))
order by 1;

Of course, it’s pointless just to re-implement the built-in unnest(). Moreover, the built-in can accept an actual argument of any datatype. I illustrated a few possibilities in the section that described it. You’d have to implement each overload that you needed explicitly if you programmed it using PL/pgSQL.

My purpose in showing you my_unnest() is this:

  • firstly, to show you, ordinarily, how to iterate over the elements in an array in PL/pgSQL;
  • and secondly, to show you the overall structure of a table function that computes values that you can then use as an in list in a subquery.

Who knows what arbitrary computation you might want to use to produce the in list values-and, of course, the built-in unnest() simply has its fixed functionality.

Use case 4: Compact syntax for bulk insert

Consider the procedure created thus:

create procedure insert_rows(vs in varchar[])
language plpgsql
as $body$
declare
element t.v%type;
begin
foreach element in array vs loop
insert into t(v) values(element);
end loop;
end
$body$;

If we have a table created thus:

create table t(
k uuid
default gen_random_uuid()
constraint t_pk primary key,
v varchar(100));

then we can populate it thus:

call insert_rows(
array['one', 'two', 'three']);

I used the same “default gen_random_uuid()” approach to populate the table’s primary key column in my code example in my “ Using Stored Procedures in Distributed SQL Databases” post. I pointed out there that you must make gen_random_uuid() available by installing the PostgreSQL extension like this:

create extension pgcrypto

as explained in the YugabyteDB documentation here.

Notice that the procedure simply loops over the elements of the input array and executes an explicit single-row insert for each value. This looks to be inefficient; but PL/pgSQL doesn’t support methods for bulk binding to insert, update, or delete statements.

Here’s an alternative implementation of insert_rows() that uses unnest():

create or replace procedure insert_rows(vs in varchar[])
language plpgsql
as $body$
begin
insert into t(v)
select unnest from unnest(vs::varchar[]);
end;
$body$;

Of course, the use of this second implementation of insert_rows() is the same as that for the first implementation that I showed. It’s certainly shorter than the implementation that uses this:

foreach element in array vs loop
insert into t(v) values(element);
end loop;

It’s therefore easier to understand and review. However, my tests showed that the performance of the two implementations, and of a third that uses my_unnest() was the same within the confidence limits of my measurements.

Conclusion

When developers have a rich set of tools, they’re much better placed to implement an optimal approach for a new use case than when they’re limited by an impoverished toolkit. This three part series of blog posts has presented several use cases where table functions showed their value. User-defined table functions are part of the larger picture of stored procedures, so to use them you have to use a distributed SQL database that supports these! YugabyteDB inherits PostgreSQL’s support, and in this post series I’ve described examples implemented both in SQL and in PL/pgSQL. The examples have used a very wide range of PostgreSQL’s SQL and stored procedure functionality. The fact that all the code that I’ve shown works identically in YugabyteDB and vanilla PostgreSQL is a testament to the success of our strategy to use the upper half of the PostgreSQL query layer simply “as is” on top of our own unique storage layer. This architecture makes YugabyteDB unique among distributed SQL databases.

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.

Appendix

Here is the complete source code for the use case “Pretty printed ad hoc reports for administrators”:

create function pretty_report_speed_ratios()
returns table(t varchar)
language plpgsql
as $body$
declare
quot constant varchar(1) not null := '''';
begin
for j in 1..5 loop
t := '';
return next;
end loop;

Run it like this:

\t on
select t from pretty_report_speed_ratios();
\t off

The \t on meta-command simply turns off the clutter of the column header (the name of the column t), the underline for that, and the row count footer.


Originally published at https://blog.yugabyte.com on December 16, 2019.

The Distributed SQL Blog

YugabyteDB is the open source, high-performance, distributed SQL database for global, internet-scale apps.

Bryn Llewellyn

Written by

Developer Advocate at YugaByte

The Distributed SQL Blog

YugabyteDB is the open source, high-performance, distributed SQL database for global, internet-scale apps.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade