HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Allow for dynamic result sets from plpgsql function by filtering based on an array passed in?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fromresultarraypassedfunctionallowforplpgsqldynamicbased

Problem

I think I've avoided the XY problem here, as I'm laying out my solutions to the real underlying problem (summarizing multiple tables in a dynamic fashion) and I'm only asking about the one final part where I got stuck. Therefore, there is a fair bit of background first. I've provided a minimal example dataset, and working code for summarizing the data in the fashion I describe.

Consider a setup as follows:

create temp table tbl1 (id int primary key, category text, passing boolean);

insert into tbl1 values
(1, 'A', 't'),
(2, 'A', 't'),
(3, 'A', 't'),
(4, 'A', 'f'),
(5, 'B', 't'),
(6, 'B', 'f'),
(7, 'C', 't'),
(8, 'C', 't'),
(9, 'C', 'f'),
(10, 'C', 'f'),
(11, 'C', 'f'),
(12, 'C', 'f'),
(13, 'B', 't'),
(14, 'B', 'f'),
(15, 'B', 't'),
(16, 'B', 'f'),
(17, 'B', 't'),
(18, 'B', 'f'),
(19, 'B', 't'),
(20, 'B', 'f');


Then I can produce the following summary:

postgres=> select category, passing, count(*) from tbl1 group by category, passing order by category, passing;
 category | passing | count
----------+---------+-------
 A        | f       |     1
 A        | t       |     3
 B        | f       |     5
 B        | t       |     5
 C        | f       |     4
 C        | t       |     2
(6 rows)


However, I have multiple such tables (all using the same categories A, B, C) that I want to summarize, so the final result I want to have displayed needs to be just a single row to summarize one table, like so:

Table Name | Overall passing rate | A passing rate | B passing rate | C passing rate
------------+----------------------+----------------+----------------+----------------
 tbl1       | 50% (10/20)          | 75% (3/4)      | 50% (5/10)     | 33% (2/6)


I also need to be able to filter sometimes, such as only returning information about categories A and B and ignoring C, like so:

```
Table Name | Overall passing rate | A passing rate | B passing rate
------------+----------------------+----------------+----------------
tbl1 | 57% (8/

Solution

I worked out how to do it using unnest(...) with ordinality and returning an array as part of the results.

Here is the function definition:

create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall text, by_category text[])
language plpgsql
as $funcdef$
  begin
    foreach tbl in array tbls
    loop
      return query execute format(
        $query$
          with tallies as (
            select
              category,
              count(*) filter (where passing) as passcount,
              count(*) as allcount from %I group by category
          ),
          categories_passed as (
            select * from unnest(%L::text[]) with ordinality as x(category, rn)
          )
          select
            %1$L::name as tbl,
            format('%%s%%%% (%%s/%%s)', (sum(passcount)*100/sum(allcount))::int, sum(passcount), sum(allcount)) as overall,
            array_agg(format('%%s%%%% (%%s/%%s)', passcount*100/allcount, passcount, allcount) order by rn) as by_category
          from categories_passed natural left join tallies;
        $query$,
        tbl,
        categories
      );
    end loop;
    return;
  end
$funcdef$
;


The raw results (with select *) look like:

postgres=> select * from summarize_tables('{A,B,C}', 'tbl1');
 tbl  |   overall   |              by_category
------+-------------+----------------------------------------
 tbl1 | 50% (10/20) | {"75% (3/4)","50% (5/10)","33% (2/6)"}
(1 row)

postgres=> select * from summarize_tables('{A,B}', 'tbl1');
 tbl  |  overall   |        by_category
------+------------+----------------------------
 tbl1 | 57% (8/14) | {"75% (3/4)","50% (5/10)"}
(1 row)


Note that the results that are broken down by category are not sorted alphabetically, but kept in the same order passed in:

postgres=> select * from summarize_tables('{B,A}', 'tbl1');
 tbl  |  overall   |        by_category
------+------------+----------------------------
 tbl1 | 57% (8/14) | {"50% (5/10)","75% (3/4)"}
(1 row)


To get the exact results shown in the question requires column names to be given and the results to be pulled out of the array:

select
  tbl as "Table Name",
  overall as "Overall passing rate",
  by_category[1] as "A passing rate",
  by_category[2] as "B passing rate",
  by_category[3] as "C passing rate"
from summarize_tables('{A,B,C}', 'tbl1');

 Table Name | Overall passing rate | A passing rate | B passing rate | C passing rate
------------+----------------------+----------------+----------------+----------------
 tbl1       | 50% (10/20)          | 75% (3/4)      | 50% (5/10)     | 33% (2/6)
(1 row)

select
  tbl as "Table Name",
  overall as "Overall passing rate",
  by_category[1] as "A passing rate",
  by_category[2] as "B passing rate"
from summarize_tables('{A,B}', 'tbl1');

 Table Name | Overall passing rate | A passing rate | B passing rate
------------+----------------------+----------------+----------------
 tbl1       | 57% (8/14)           | 75% (3/4)      | 50% (5/10)
(1 row)


Not called for in the question, but if it's wanted for the caller of the function to have access to the raw numbers (e.g. for sorting or numeric conditions), then nested arrays can be returned instead, and the formatting can be done by the caller:

create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall numeric[], by_category numeric[][])
language plpgsql
as $funcdef$
  begin
    foreach tbl in array tbls
    loop
      return query execute format(
        $query$
          with tallies as (
            select
              category,
              count(*) filter (where passing)::numeric as passcount,
              count(*)::numeric as allcount from %I group by category
          ),
          categories_passed as (
            select * from unnest(%L::text[]) with ordinality as x(category, rn)
          )
          select
            %1$L::name as tbl,
            array[sum(passcount), sum(allcount)] as overall,
            array_agg(array[passcount, allcount] order by rn) as by_category
          from categories_passed natural left join tallies;
        $query$,
        tbl,
        categories
      );
    end loop;
    return;
  end
$funcdef$
;


The SQL to pull out the values from the nested arrays isn't pretty, but it works:

with x as (
  select
    tbl,
    overall as o,
    by_category[1:1] as a,
    by_category[2:2] as b,
    by_category[3:3] as c
  from summarize_tables('{A,B,C}', 'tbl1')
)
select
  tbl,
  format('%s%% (%s/%s)', (100*x.o[1]/x.o[2])::int, x.o[1], x.o[2]) as "Overall passing rate",
  format('%s%% (%s/%s)', (100*x.a[1][1]/x.a[1][2])::int, x.a[1][1], x.a[1][2]) as "A passing rate",
  format('%s%% (%s/%s)', (100*x.b[1][1]/x.b[1][2])::int, x.b[1][1], x.b[1][2]) as "B passing rate",
  format('%s%% (%s/%s)', (100*x.c[1][1]/x.c[1][2])::int, x.c[1][1], x.c[1][2]) as "C passing rate"
from x
;


Related reading about how Postgres

Code Snippets

create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall text, by_category text[])
language plpgsql
as $funcdef$
  begin
    foreach tbl in array tbls
    loop
      return query execute format(
        $query$
          with tallies as (
            select
              category,
              count(*) filter (where passing) as passcount,
              count(*) as allcount from %I group by category
          ),
          categories_passed as (
            select * from unnest(%L::text[]) with ordinality as x(category, rn)
          )
          select
            %1$L::name as tbl,
            format('%%s%%%% (%%s/%%s)', (sum(passcount)*100/sum(allcount))::int, sum(passcount), sum(allcount)) as overall,
            array_agg(format('%%s%%%% (%%s/%%s)', passcount*100/allcount, passcount, allcount) order by rn) as by_category
          from categories_passed natural left join tallies;
        $query$,
        tbl,
        categories
      );
    end loop;
    return;
  end
$funcdef$
;
postgres=> select * from summarize_tables('{A,B,C}', 'tbl1');
 tbl  |   overall   |              by_category
------+-------------+----------------------------------------
 tbl1 | 50% (10/20) | {"75% (3/4)","50% (5/10)","33% (2/6)"}
(1 row)

postgres=> select * from summarize_tables('{A,B}', 'tbl1');
 tbl  |  overall   |        by_category
------+------------+----------------------------
 tbl1 | 57% (8/14) | {"75% (3/4)","50% (5/10)"}
(1 row)
postgres=> select * from summarize_tables('{B,A}', 'tbl1');
 tbl  |  overall   |        by_category
------+------------+----------------------------
 tbl1 | 57% (8/14) | {"50% (5/10)","75% (3/4)"}
(1 row)
select
  tbl as "Table Name",
  overall as "Overall passing rate",
  by_category[1] as "A passing rate",
  by_category[2] as "B passing rate",
  by_category[3] as "C passing rate"
from summarize_tables('{A,B,C}', 'tbl1');

 Table Name | Overall passing rate | A passing rate | B passing rate | C passing rate
------------+----------------------+----------------+----------------+----------------
 tbl1       | 50% (10/20)          | 75% (3/4)      | 50% (5/10)     | 33% (2/6)
(1 row)

select
  tbl as "Table Name",
  overall as "Overall passing rate",
  by_category[1] as "A passing rate",
  by_category[2] as "B passing rate"
from summarize_tables('{A,B}', 'tbl1');

 Table Name | Overall passing rate | A passing rate | B passing rate
------------+----------------------+----------------+----------------
 tbl1       | 57% (8/14)           | 75% (3/4)      | 50% (5/10)
(1 row)
create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall numeric[], by_category numeric[][])
language plpgsql
as $funcdef$
  begin
    foreach tbl in array tbls
    loop
      return query execute format(
        $query$
          with tallies as (
            select
              category,
              count(*) filter (where passing)::numeric as passcount,
              count(*)::numeric as allcount from %I group by category
          ),
          categories_passed as (
            select * from unnest(%L::text[]) with ordinality as x(category, rn)
          )
          select
            %1$L::name as tbl,
            array[sum(passcount), sum(allcount)] as overall,
            array_agg(array[passcount, allcount] order by rn) as by_category
          from categories_passed natural left join tallies;
        $query$,
        tbl,
        categories
      );
    end loop;
    return;
  end
$funcdef$
;

Context

StackExchange Database Administrators Q#256978, answer score: 2

Revisions (0)

No revisions yet.