patternsqlMinor
Select consecutive rows that are in a date range of each other
Viewed 0 times
rowseacharerangedatethatselectotherconsecutive
Problem
Having a list of associated dates & names and I want to select: all rows with names having multiple dates when the difference between their dates is more than 1 month.
e.g.: only the entries indicated below marked with
In other words the criteria is:
Here's a working example as well as my attempt based on another answer on SO:
Rextester working example and attempt
Additional conditions/hints/...
e.g.: only the entries indicated below marked with
\this\CREATE TABLE IF NOT EXISTS myTab (
id SERIAL PRIMARY KEY,
dateID DATETIME DEFAULT 0,
name VARCHAR(512)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
INSERT INTO myTab
(dateID, name)
VALUES
("20140811","Emmy"), /*this*/
("20140922","Emmy"), /*this*/
("20150920","Emmy"), /*this*/
("20150922","Emmy"),
("20140722","Dave"),
("20140613","Stan"),
("20140622","Stan"), /*this*/
("20151020","Stan"), /*this*/
("20140305","Lora"),
("20140310","Lora");In other words the criteria is:
- Partition by name
- Order by date
- Compare 2 consecutive rows:
IF diff > 1 MONTH THEN select both, ELSE skip
Here's a working example as well as my attempt based on another answer on SO:
Rextester working example and attempt
Additional conditions/hints/...
- Rows having the same name are not necessarily inserted once after the other making them spaced with
+1 idfrom eachothers. Nor they are inserted oredered by date. In the example above it's done so just for readability. In my real problem it's not the case!
- After applying your suggestions on real data I noticed an extra condition to be added tagged
/this_EXRA/in the example above. The 3rdstanrow is in less than 1 MONTH from the 2nd but validates it with the 4th. Thus, it should only be selected if it validates them both. So I guess this implicates looping row by row and compare with previous and next one each time.
Solution
The logic looks simple at first, but it's quite complicated to get it right.
Let's have a working solution first, and worry about performance later. Tested at rextester.com:
Regarding efficiency: the query will perform rather poorly. An index on
Let's have a working solution first, and worry about performance later. Tested at rextester.com:
SELECT t.id, t.dateID, t.name
FROM myTab AS t
WHERE
( SELECT b.dateID
FROM myTab AS b
WHERE t.name = b.name
AND b.dateID < t.dateID
ORDER BY b.dateID DESC
LIMIT 1
) + INTERVAL 1 MONTH <= t.dateID
OR
t.dateID + INTERVAL 1 MONTH <=
( SELECT b.dateID
FROM myTab AS b
WHERE t.name = b.name
AND t.dateID < b.dateID
ORDER BY b.dateID ASC
LIMIT 1
)
;Regarding efficiency: the query will perform rather poorly. An index on
(name, dateID, id) will help but the query will still need to do 2 subqueries for each row of the table.Code Snippets
SELECT t.id, t.dateID, t.name
FROM myTab AS t
WHERE
( SELECT b.dateID
FROM myTab AS b
WHERE t.name = b.name
AND b.dateID < t.dateID
ORDER BY b.dateID DESC
LIMIT 1
) + INTERVAL 1 MONTH <= t.dateID
OR
t.dateID + INTERVAL 1 MONTH <=
( SELECT b.dateID
FROM myTab AS b
WHERE t.name = b.name
AND t.dateID < b.dateID
ORDER BY b.dateID ASC
LIMIT 1
)
;Context
StackExchange Database Administrators Q#159291, answer score: 5
Revisions (0)
No revisions yet.