patternsqlMinor
Indexes for SQL query with WHERE condition and GROUP BY
Viewed 0 times
conditiongroupsqlwithquerywhereindexesforand
Problem
I am trying to determine which indexes to use for an SQL query with a
My query:
The table currently has 32.000.000 rows. The execution time of the query increases a lot when I increase the time-frame.
The table in question looks like this:
I currently have the following indexes, but the performance is still slow:
Running EXPLAIN on the query gives the following result:
SQL Fiddle with example data: http://sqlfiddle.com/#!15/7492b/1
The Question
Can the performance of this query be improved by adding better indexes, or must I increase the processing power?
Edit 1
PostgreSQL version 9.3.2 is used.
Edit 2
I tried @Erwin 's proposal with
But unfortunetly this didn't seem to increase the performance. The Query Plan:
```
"QUERY PLAN"
"Nested Loop Semi Join (cost=1607.18..371680.60 rows=113 width=4)"
WHERE condition and a GROUP BY which is currently running very slow. My query:
SELECT group_id
FROM counter
WHERE ts between timestamp '2014-03-02 00:00:00.0' and timestamp '2014-03-05 12:00:00.0'
GROUP BY group_idThe table currently has 32.000.000 rows. The execution time of the query increases a lot when I increase the time-frame.
The table in question looks like this:
CREATE TABLE counter (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, group_id bigint NOT NULL
);I currently have the following indexes, but the performance is still slow:
CREATE INDEX ts_index
ON counter
USING btree
(ts);
CREATE INDEX group_id_index
ON counter
USING btree
(group_id);
CREATE INDEX comp_1_index
ON counter
USING btree
(ts, group_id);
CREATE INDEX comp_2_index
ON counter
USING btree
(group_id, ts);Running EXPLAIN on the query gives the following result:
"QUERY PLAN"
"HashAggregate (cost=467958.16..467958.17 rows=1 width=4)"
" -> Index Scan using ts_index on counter (cost=0.56..467470.93 rows=194892 width=4)"
" Index Cond: ((ts >= '2014-02-26 00:00:00'::timestamp without time zone) AND (ts <= '2014-02-27 23:59:00'::timestamp without time zone))"SQL Fiddle with example data: http://sqlfiddle.com/#!15/7492b/1
The Question
Can the performance of this query be improved by adding better indexes, or must I increase the processing power?
Edit 1
PostgreSQL version 9.3.2 is used.
Edit 2
I tried @Erwin 's proposal with
EXISTS:SELECT group_id
FROM groups g
WHERE EXISTS (
SELECT 1
FROM counter c
WHERE c.group_id = g.group_id
AND ts BETWEEN timestamp '2014-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
);But unfortunetly this didn't seem to increase the performance. The Query Plan:
```
"QUERY PLAN"
"Nested Loop Semi Join (cost=1607.18..371680.60 rows=113 width=4)"
Solution
Another idea, that also uses the
Test at SQL-Fiddle shows that the query does index scans on the
Similar plans are produced using 2 lateral joins, one for min and one for max and also with 2 inline correlated subqueries. They could also be used if you need to show the whole
groups table and a construction called LATERAL join (for SQL-Server fans, this is almost identical to OUTER APPLY). It has the advantage that aggregates can be calculated in the subquery:SELECT group_id, min_ts, max_ts
FROM groups g, -- notice the comma here, is required
LATERAL
( SELECT MIN(ts) AS min_ts,
MAX(ts) AS max_ts
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2011-03-02 00:00:00'
AND timestamp '2013-03-05 12:00:00'
) x
WHERE min_ts IS NOT NULL ;Test at SQL-Fiddle shows that the query does index scans on the
(group_id, ts) index.Similar plans are produced using 2 lateral joins, one for min and one for max and also with 2 inline correlated subqueries. They could also be used if you need to show the whole
counter rows besides the min and max dates:SELECT group_id,
min_ts, min_ts_id,
max_ts, max_ts_id
FROM groups g
, LATERAL
( SELECT ts AS min_ts, c.id AS min_ts_id
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
ORDER BY ts ASC
LIMIT 1
) xmin
, LATERAL
( SELECT ts AS max_ts, c.id AS max_ts_id
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
ORDER BY ts DESC
LIMIT 1
) xmax
WHERE min_ts IS NOT NULL ;Code Snippets
SELECT group_id, min_ts, max_ts
FROM groups g, -- notice the comma here, is required
LATERAL
( SELECT MIN(ts) AS min_ts,
MAX(ts) AS max_ts
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2011-03-02 00:00:00'
AND timestamp '2013-03-05 12:00:00'
) x
WHERE min_ts IS NOT NULL ;SELECT group_id,
min_ts, min_ts_id,
max_ts, max_ts_id
FROM groups g
, LATERAL
( SELECT ts AS min_ts, c.id AS min_ts_id
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
ORDER BY ts ASC
LIMIT 1
) xmin
, LATERAL
( SELECT ts AS max_ts, c.id AS max_ts_id
FROM counter c
WHERE c.group_id = g.group_id
AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
ORDER BY ts DESC
LIMIT 1
) xmax
WHERE min_ts IS NOT NULL ;Context
StackExchange Database Administrators Q#60777, answer score: 7
Revisions (0)
No revisions yet.