patternsqlMinor
Fetch a row that contains the set of last non-NULL values for each column
Viewed 0 times
lasttheeachnoncolumnnullfetchthatcontainsfor
Problem
Suppose there is a table
What I need is to fetch the last non-null value for each column in Postgres i.e. For the above table, I expect the result as (A1, B2, C1).
Here 'last' means the last non-null value in each column when the table is ordered by column version.
The version column always contains non-null values only.
This table is not going to be huge as in this will be a couple of thousand rows only. So, not worried very much about performance.
version col_A col_B col_C
1 A1 B1 (null)
2 A2 B3 (null)
3 A3 B2 (null)
4 A5 (null) C1
5 A1 (null) (null)
What I need is to fetch the last non-null value for each column in Postgres i.e. For the above table, I expect the result as (A1, B2, C1).
Here 'last' means the last non-null value in each column when the table is ordered by column version.
The version column always contains non-null values only.
This table is not going to be huge as in this will be a couple of thousand rows only. So, not worried very much about performance.
Solution
You can do the following (all of the code below is available on the fiddle here):
Data:
Then run this query:
Result:
You might like to consider putting in something like the
Performance Analysis:
As of 2021-11-26 07:40:00 UTC, I have done a performance analysis which is available here.
With 1k rows (and a unique index on
With 20k rows (and 19k of them all nulls in columns A, B, C):
With the same 20k rows as above after adding 3 partial indexes on
With the same 20k rows as above after adding 3 partial indexes on
Last test is with 20k rows but there are non-null values dispersed. Here both Gerars's and Quassnoi-2 are quite fast, not as good as Quassnoi-1 but the difference is tiny:
Usual caveats concerning performing tests on a server over which one has no control or knowledge apply!
I'll keep it revised as the competition hots up!
CREATE TABLE tab
(
version INT PRIMARY KEY,
col_A TEXT,
col_B TEXT,
col_C TEXT
);Data:
INSERT INTO tab (version, col_A, col_B, col_C)
VALUES
(1, 'A1', 'B1', null),
(2, 'A2', 'B3', null),
(3, 'A3', 'B2', null),
(4, 'A4', null, 'C1'),
(5, 'A5', null, null);Then run this query:
select
(select col_a from tab where col_a is not null order by version desc limit 1) as a,
(select col_b from tab where col_b is not null order by version desc limit 1) as b,
(select col_c from tab where col_c is not null order by version desc limit 1) as c ;Result:
a b c
A5 B2 C1You might like to consider putting in something like the
COALESCE function in case one (or more?) of the columns only contains NULL values. Finally, in future, when asking questions of this sort, could you please provide a fiddle with your tables and data? Help us to help you - p.s. welcome to dba.se!Performance Analysis:
As of 2021-11-26 07:40:00 UTC, I have done a performance analysis which is available here.
With 1k rows (and a unique index on
(version)):- Quassnoi option 1 ~ 0.060 ms
- Gerard H. Pille ~ 0.176 ms
- Quassnoi option 2 ~ 0.183 ms
- J.D. ~ 2.906 ms
With 20k rows (and 19k of them all nulls in columns A, B, C):
- Quassnoi option 1 ~ 8 ms
- Quassnoi option 2 ~ 15 ms
- Gerard H. Pille ~ 37 ms
- J.D. ~ 49 ms
With the same 20k rows as above after adding 3 partial indexes on
(version) WHERE (col_a IS NOT NULL), etc.:- Quassnoi option 1 ~ 0.8 ms
- Quassnoi option 2 ~ 14.0 ms
- Gerard H. Pille ~ 36.9 ms
- J.D. ~ 48.5 ms
With the same 20k rows as above after adding 3 partial indexes on
(version) INCLUDE (col_a) WHERE (col_a IS NOT NULL), etc. Quassnoi-1 query kills it here, outperforming all the others by a big magnitude:- Quassnoi option 1 ~ 0.12 ms
- Quassnoi option 2 ~ 11.9 ms
- Gerard H. Pille ~ 36.9 ms
- J.D. ~ 48.9 ms
Last test is with 20k rows but there are non-null values dispersed. Here both Gerars's and Quassnoi-2 are quite fast, not as good as Quassnoi-1 but the difference is tiny:
- Quassnoi option 1 ~ 0.050 ms
- Quassnoi option 2 ~ 0.112 ms
- Gerard H. Pille ~ 0.168 ms
- J.D. ~ 49.1 ms
Usual caveats concerning performing tests on a server over which one has no control or knowledge apply!
I'll keep it revised as the competition hots up!
Code Snippets
CREATE TABLE tab
(
version INT PRIMARY KEY,
col_A TEXT,
col_B TEXT,
col_C TEXT
);INSERT INTO tab (version, col_A, col_B, col_C)
VALUES
(1, 'A1', 'B1', null),
(2, 'A2', 'B3', null),
(3, 'A3', 'B2', null),
(4, 'A4', null, 'C1'),
(5, 'A5', null, null);select
(select col_a from tab where col_a is not null order by version desc limit 1) as a,
(select col_b from tab where col_b is not null order by version desc limit 1) as b,
(select col_c from tab where col_c is not null order by version desc limit 1) as c ;a b c
A5 B2 C1Context
StackExchange Database Administrators Q#303106, answer score: 6
Revisions (0)
No revisions yet.