patternsqlMinor
Benchmarking of SQL data crunching using a common datum
Viewed 0 times
sqlcommonusingdatumcrunchingdatabenchmarking
Problem
@nhgrif and I were curious about a couple different ways to aggregate a number of records based on a duplicate/common datum. Two different ideas will be shown. Any advice on all code but especially making it perform better is appreciated.
Note: @nhgrif did not write most of this, so if it's wrong it's my fault...
Make a bunch of data
Code:
Just for the record:
Using
PS: Capitalization is different in this; @nhgrif wrote it and I was lazy.
PPS: Table aliases are not that great but I'm just benchmarking, please forgive...
```
SET SEARCH_PATH = test;
SELECT
seeded.Serial_Number,
COUNT(one.Contract_Number) as One,
COUNT(two.Contract_Number) as Two,
COUNT(three.Contract_Number) as Three
FROM seeded
LEFT JOIN seeded as one
ON one.Contract_Number = seeded.Contract_Number
AND seeded.Status = 'one'
Note: @nhgrif did not write most of this, so if it's wrong it's my fault...
Make a bunch of data
Code:
set search_path = test;
-- create tables to seed a large number of records
drop table if exists contract_number_seed;
create table contract_number_seed(
contract_number int
);
insert into contract_number_seed
(contract_number)
select * from generate_series(1, 100000)
;
drop table if exists serial_number_seed;
create table serial_number_seed(
serial_number int
);
insert into serial_number_seed
(serial_number)
select * from generate_series(1, 10)
;
drop table if exists status_seed;
create table status_seed(
status text
);
insert into status_seed (status)
values (NULL), ('one'), ('two'), ('three');
-- generate a whole bunch of records from seed tables
drop table if exists seeded;
select
contract_number_seed.contract_number,
serial_number_seed.serial_number,
status_seed.status
into seeded
from contract_number_seed
cross join serial_number_seed
cross join status_seed;
-- Query returned successfully: 4000000 rows affected, 11368 ms execution time.Just for the record:
set search_path = test;
select * from seeded;
-- Total query runtime: 8056 ms.
-- 4000000 rows retrieved.Using
LEFT OUTER JOINPS: Capitalization is different in this; @nhgrif wrote it and I was lazy.
PPS: Table aliases are not that great but I'm just benchmarking, please forgive...
```
SET SEARCH_PATH = test;
SELECT
seeded.Serial_Number,
COUNT(one.Contract_Number) as One,
COUNT(two.Contract_Number) as Two,
COUNT(three.Contract_Number) as Three
FROM seeded
LEFT JOIN seeded as one
ON one.Contract_Number = seeded.Contract_Number
AND seeded.Status = 'one'
Solution
There is a bug in your second version. Here is one row from the result:
That is,
But let's do a quick sanity check:
This query returns 100000.
There are also rows with the same
I think you want a query like this:
This takes ~2.1s on my machine.
Just for fun, let's look at what pgAdmin's EXPLAIN visualisation looks like for this query
versus the 30m query
1;0;2;0That is,
Serial_Number 1 has zero ones, two twos, and zero threes.But let's do a quick sanity check:
select count(*) from seeded
where serial_number = 1 and status = 'one';This query returns 100000.
There are also rows with the same
Serial_Number, which I think is not what you want.I think you want a query like this:
select
seeded.Serial_Number,
sum(case when seeded.Status = 'one' then 1 else 0 end) as ones,
sum(case when seeded.Status = 'two' then 1 else 0 end) as twos,
sum(case when seeded.Status = 'three' then 1 else 0 end) as threes
from seeded
group by seeded.Serial_Number;This takes ~2.1s on my machine.
Just for fun, let's look at what pgAdmin's EXPLAIN visualisation looks like for this query
versus the 30m query
Code Snippets
select count(*) from seeded
where serial_number = 1 and status = 'one';select
seeded.Serial_Number,
sum(case when seeded.Status = 'one' then 1 else 0 end) as ones,
sum(case when seeded.Status = 'two' then 1 else 0 end) as twos,
sum(case when seeded.Status = 'three' then 1 else 0 end) as threes
from seeded
group by seeded.Serial_Number;Context
StackExchange Code Review Q#60446, answer score: 8
Revisions (0)
No revisions yet.