HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Fetch a row that contains the set of last non-NULL values for each column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lasttheeachnoncolumnnullfetchthatcontainsfor

Problem

Suppose there is a table
version col_A col_B col_C
1 A1 B1 (null)
2 A2 B3 (null)
3 A3 B2 (null)
4 A5 (null) C1
5 A1 (null) (null)


What I need is to fetch the last non-null value for each column in Postgres i.e. For the above table, I expect the result as (A1, B2, C1).

Here 'last' means the last non-null value in each column when the table is ordered by column version.

The version column always contains non-null values only.

This table is not going to be huge as in this will be a couple of thousand rows only. So, not worried very much about performance.

Solution

You can do the following (all of the code below is available on the fiddle here):

CREATE TABLE tab
(
  version INT PRIMARY KEY,
  col_A TEXT,
  col_B TEXT,
  col_C TEXT
);


Data:

INSERT INTO tab (version, col_A, col_B, col_C)
VALUES
(1, 'A1', 'B1', null),
(2, 'A2', 'B3', null),
(3, 'A3', 'B2', null),
(4, 'A4', null, 'C1'),
(5, 'A5', null, null);


Then run this query:

select
    (select col_a from tab where col_a is not null order by version desc limit 1) as a,
    (select col_b from tab where col_b is not null order by version desc limit 1) as b,
    (select col_c from tab where col_c is not null order by version desc limit 1) as c ;


Result:

a   b   c
A5  B2  C1


You might like to consider putting in something like the COALESCE function in case one (or more?) of the columns only contains NULL values. Finally, in future, when asking questions of this sort, could you please provide a fiddle with your tables and data? Help us to help you - p.s. welcome to dba.se!
Performance Analysis:

As of 2021-11-26 07:40:00 UTC, I have done a performance analysis which is available here.

With 1k rows (and a unique index on (version)):

  • Quassnoi option 1 ~ 0.060 ms



  • Gerard H. Pille ~ 0.176 ms



  • Quassnoi option 2 ~ 0.183 ms



  • J.D. ~ 2.906 ms



With 20k rows (and 19k of them all nulls in columns A, B, C):

  • Quassnoi option 1 ~ 8 ms



  • Quassnoi option 2 ~ 15 ms



  • Gerard H. Pille ~ 37 ms



  • J.D. ~ 49 ms



With the same 20k rows as above after adding 3 partial indexes on (version) WHERE (col_a IS NOT NULL), etc.:

  • Quassnoi option 1 ~ 0.8 ms



  • Quassnoi option 2 ~ 14.0 ms



  • Gerard H. Pille ~ 36.9 ms



  • J.D. ~ 48.5 ms



With the same 20k rows as above after adding 3 partial indexes on (version) INCLUDE (col_a) WHERE (col_a IS NOT NULL), etc. Quassnoi-1 query kills it here, outperforming all the others by a big magnitude:

  • Quassnoi option 1 ~ 0.12 ms



  • Quassnoi option 2 ~ 11.9 ms



  • Gerard H. Pille ~ 36.9 ms



  • J.D. ~ 48.9 ms



Last test is with 20k rows but there are non-null values dispersed. Here both Gerars's and Quassnoi-2 are quite fast, not as good as Quassnoi-1 but the difference is tiny:

  • Quassnoi option 1 ~ 0.050 ms



  • Quassnoi option 2 ~ 0.112 ms



  • Gerard H. Pille ~ 0.168 ms



  • J.D. ~ 49.1 ms



Usual caveats concerning performing tests on a server over which one has no control or knowledge apply!

I'll keep it revised as the competition hots up!

Code Snippets

CREATE TABLE tab
(
  version INT PRIMARY KEY,
  col_A TEXT,
  col_B TEXT,
  col_C TEXT
);
INSERT INTO tab (version, col_A, col_B, col_C)
VALUES
(1, 'A1', 'B1', null),
(2, 'A2', 'B3', null),
(3, 'A3', 'B2', null),
(4, 'A4', null, 'C1'),
(5, 'A5', null, null);
select
    (select col_a from tab where col_a is not null order by version desc limit 1) as a,
    (select col_b from tab where col_b is not null order by version desc limit 1) as b,
    (select col_c from tab where col_c is not null order by version desc limit 1) as c ;
a   b   c
A5  B2  C1

Context

StackExchange Database Administrators Q#303106, answer score: 6

Revisions (0)

No revisions yet.