patternsqlMinor
Select rows for which at least one row per set meets a condition
Viewed 0 times
rowsconditionperoneforleastwhichselectrowmeets
Problem
I have following table:
Basically, there are 5 different companies, each one has one or more clients and each client has status: 'y' or 'n' (might be null as well).
What I have to do is to select all pairs
I tried something with window functions but I can't figure out how to compare the number of ALL clients with the number of clients with
I figured out how to do this, but I am not sure if it's the right way:
create table test (
company_id integer not null,
client_id integer not null,
client_status text,
unique (company_id, client_id)
);
insert into test values
(1, 1, 'y'), -- company1
(2, 2, null), -- company2
(3, 3, 'n'), -- company3
(4, 4, 'y'), -- company4
(4, 5, 'n'),
(5, 6, null), -- company5
(5, 7, 'n')
;Basically, there are 5 different companies, each one has one or more clients and each client has status: 'y' or 'n' (might be null as well).
What I have to do is to select all pairs
(company_id, client_id) for all companies for which there is at least one client whose status is not 'n' ('y' or null). So for the example data above, the output should be:company_id;client_id
1;1
2;2
4;4
4;5
5;6
5;7I tried something with window functions but I can't figure out how to compare the number of ALL clients with the number of clients with
STATUS = 'n'.select company_id,
count(*) over (partition by company_id) as all_clients_count
from test
-- where all_clients_count != ... ?I figured out how to do this, but I am not sure if it's the right way:
select sub.company_id, unnest(sub.client_ids)
from (
select company_id, array_agg(client_id) as client_ids
from test
group by company_id
having count(*) != count( (case when client_status = 'n' then 1 else null end) )
) subSolution
Basically you are looking for the expression:
The column
The manual has details in the chapter Comparison Operators.
Assuming your actual table has a
Queries
All of these do the same (what you asked), which is the fastest depends on data distribution:
Or:
Or:
Boolean values sort
The added PK is implemented with a useful index for these queries. If you want faster, yet, add a partial index for query 1:
You could use window functions, too, but that would be slower. Example with
For lots of rows per
client_status IS DISTINCT FROM 'n'The column
client_status should really be data type boolean, not text, which would allow the simpler expression:client_status IS NOT FALSEThe manual has details in the chapter Comparison Operators.
Assuming your actual table has a
UNIQUE or PK constraint, we arrive at:CREATE TABLE test (
company_id integer NOT NULL,
client_id integer NOT NULL,
client_status boolean,
PRIMARY KEY (company_id, client_id)
);Queries
All of these do the same (what you asked), which is the fastest depends on data distribution:
SELECT company_id, client_id
FROM test t
WHERE EXISTS (
SELECT FROM test
WHERE company_id = t.company_id
AND client_status IS NOT FALSE
);Or:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT company_id
FROM test t
GROUP BY 1
HAVING bool_or(client_status IS NOT FALSE)
) c USING (company_id);Or:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT DISTINCT company_id, client_status
FROM test t
ORDER BY company_id, client_status DESC
) c USING (company_id)
WHERE c.client_status IS NOT FALSE;Boolean values sort
FALSE -> TRUE -> NULL in ascending sort order. So FALSE comes last in descending order. If there's is any other value available, then that one's picked first ...- Sorting null values after all others, except special
The added PK is implemented with a useful index for these queries. If you want faster, yet, add a partial index for query 1:
CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE client_status IS NOT FALSE;You could use window functions, too, but that would be slower. Example with
first_value():SELECT company_id, client_id
FROM (
SELECT company_id, client_id
, first_value(client_status) OVER (PARTITION BY company_id
ORDER BY client_status DESC) AS stat
FROM test t
) sub
WHERE stat IS NOT FALSE;For lots of rows per
company_id, one of these techniques may be faster, still:- Optimize GROUP BY query to retrieve latest record per user
Code Snippets
client_status IS DISTINCT FROM 'n'client_status IS NOT FALSECREATE TABLE test (
company_id integer NOT NULL,
client_id integer NOT NULL,
client_status boolean,
PRIMARY KEY (company_id, client_id)
);SELECT company_id, client_id
FROM test t
WHERE EXISTS (
SELECT FROM test
WHERE company_id = t.company_id
AND client_status IS NOT FALSE
);SELECT company_id, client_id
FROM test t
JOIN (
SELECT company_id
FROM test t
GROUP BY 1
HAVING bool_or(client_status IS NOT FALSE)
) c USING (company_id);Context
StackExchange Database Administrators Q#139980, answer score: 7
Revisions (0)
No revisions yet.