patternsqlMajor
Select longest continuous sequence
Viewed 0 times
selectlongestcontinuoussequence
Problem
I am trying to construct a query in PostgreSQL 9.0 that gets the longest sequence of continuous rows for a specific column.
Consider the following table:
Where
I would like the query to produce the longest sequence for a given
With the following data:
For
I found a similar question here however my situation is a bit more straightforward.
(I would also like to know the longest sequence for a given
Consider the following table:
lap_id (serial), lap_no (int), car_type (enum), race_id (int FK)Where
lap_no is unique for each (race_id, car_type).I would like the query to produce the longest sequence for a given
race_id and car_type, so it would return an int (or long) that is highest.With the following data:
1, 1, red, 1
2, 2, red, 1
3, 3, red, 1
4, 4, red, 1
5, 1, blue, 1
6, 5, red, 1
7, 2, blue, 1
8, 1, green, 1For
car_type = red and race_id = 1 the query would return 5 as the longest sequence of the lap_no field.I found a similar question here however my situation is a bit more straightforward.
(I would also like to know the longest sequence for a given
car_type for all races, but was planning to work that out myself.)Solution
Your description results in a table definition like this:
General solution for this class of problems
To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):
Related procedural solution with plpgsql:
If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.
Faster for consecutive numbers
We can capitalize on the fact that consecutive
Consecutive laps end up in the same
This relies on
Discussion of Jack's simpler alternative
@Jack's version effectively counts all laps (rows) where the previous
But for a task that simple the query could be simpler, yet. It would follow logically that all
If, on the other hand, one
Faster for a given race / car type
In reply to the comment / clarifications in the question: restricting the query to one given
db<>fiddle here
Old SQL Fiddle
Index
Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:
If your table has the
CREATE TABLE tbl (
lap_id serial PRIMARY KEY
, lap_no int NOT NULL
, car_type enum NOT NULL
, race_id int NOT NULL -- REFERENCES ...
, UNIQUE(race_id, car_type, lap_no)
);General solution for this class of problems
To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):
SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT *, count(*) FILTER (WHERE step)
OVER (ORDER BY race_id, car_type, lap_no) AS grp
FROM (
SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
IS DISTINCT FROM lap_no AS step
FROM tbl
) x
) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;count(*) FILTER (WHERE step) only counts TRUE (= step to next group), which results in a new number for every new group.Related procedural solution with plpgsql:
- GROUP BY and aggregate sequential numeric values
If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.
Faster for consecutive numbers
We can capitalize on the fact that consecutive
lap_no define a sequence, for a much simpler and faster version:SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT race_id, car_type
, row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
FROM tbl
) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;Consecutive laps end up in the same
grp. Every missing lap results in a lower grp per partition.This relies on
(race_id, car_type, lap_no) being UNIQUE NOT NULL. NULL values or duplicates could break the logic.Discussion of Jack's simpler alternative
@Jack's version effectively counts all laps (rows) where the previous
lap_no in this race_id had the same car_type. That's simpler and faster and correct - as long as each car_type can only have one sequence per race_id.But for a task that simple the query could be simpler, yet. It would follow logically that all
lap_no per (car_type, race_id) must be in sequence, and we could just count the laps:SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;If, on the other hand, one
car_type can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.Faster for a given race / car type
In reply to the comment / clarifications in the question: restricting the query to one given
(race_id, car_type) will make it much faster, of course:SELECT count(*) AS seq_len
FROM (
SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
FROM tbl
WHERE race_id = 1
AND car_type = 'red'
) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;db<>fiddle here
Old SQL Fiddle
Index
Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:
CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);If your table has the
UNIQUE constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.Code Snippets
CREATE TABLE tbl (
lap_id serial PRIMARY KEY
, lap_no int NOT NULL
, car_type enum NOT NULL
, race_id int NOT NULL -- REFERENCES ...
, UNIQUE(race_id, car_type, lap_no)
);SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT *, count(*) FILTER (WHERE step)
OVER (ORDER BY race_id, car_type, lap_no) AS grp
FROM (
SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
IS DISTINCT FROM lap_no AS step
FROM tbl
) x
) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT race_id, car_type
, row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
FROM tbl
) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;SELECT count(*) AS seq_len
FROM (
SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
FROM tbl
WHERE race_id = 1
AND car_type = 'red'
) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;Context
StackExchange Database Administrators Q#35380, answer score: 21
Revisions (0)
No revisions yet.