patternMinor
Mysql get records more then 3 in interval of 1 minute, return group_concat ID
Viewed 0 times
returnrecordsmoreintervalmysqlgetthengroup_concatminute
Problem
Currently i have this dataset, i need to return grouped ids that are within the range of 60 seconds and have more than 3.
✓
✓
db<>fiddle here
something like this:
Edit some context:
I currently developed a cron that runs every 1 minute to analyze the last 60 seconds of the bot records in database, I need to group the conversation ID's that have more than 3 records within 60 seconds in the same url and client_session_id.
follows the SQL I'm running:
This query works as I expect, but sometimes my cron service is sometimes down, and I lose those repeated records.
I thought about making an SQL(Cron) at the end of the day to analyze the last 24 hours, and look for the records that are repeated according to the rule I mentioned above?
CREATE TABLE test
(
`id` bigint NOT NULL AUTO_INCREMENT,
created_date TIMESTAMP(1) NOT NULL,
origin_url VARCHAR (200) NOT NULL,
client_session_id VARCHAR (50) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UC_PRE_CHAT_TALKID_COL` (`id`)
);✓
INSERT INTO test VALUES
(1,'2021-01-18 11:02:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(2,'2021-01-18 11:02:35.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(3,'2021-01-18 11:02:03.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(4,'2021-01-18 11:11:28.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii'),
(5,'2021-01-18 11:11:36.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii'),
(6,'2021-01-18 11:11:05.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii');✓
db<>fiddle here
something like this:
ids origin_url client_session_id
1,2,3 https://testett.com/ znkjoc3gfth2c3m0t1klii
4,5,6 https://rarara.com/ znkjoc3gfth2c3m0t1kliiEdit some context:
I currently developed a cron that runs every 1 minute to analyze the last 60 seconds of the bot records in database, I need to group the conversation ID's that have more than 3 records within 60 seconds in the same url and client_session_id.
follows the SQL I'm running:
select
count(session_id),
client_session_id,
GROUP_CONCAT(id) as talkIds,
origin_url
from
bot_talk
where
created_date > now() - interval 60 second
group by
client_session_id, origin_url
having
count(session_id) >= 3This query works as I expect, but sometimes my cron service is sometimes down, and I lose those repeated records.
I thought about making an SQL(Cron) at the end of the day to analyze the last 24 hours, and look for the records that are repeated according to the rule I mentioned above?
Solution
Here is the answer - see the fiddle. Another answer I wrote to a similar question may provide some clearer background and is a bit simpler - see here.
All I will say is that it gives some idea of the power of window functions.
I noticed in the comments that there was some debate about what constituted a group - in this example, I have constructed the SQL such that it picks out as a group where all subsequent sessions started within 180 seconds (i.e. 3 mins) of the first - you can change the 180 to 60 (or whatever) yourself.
I've added in some records for the purpose of testing and also added
Always use named constraints - these provide more meaningful messages than:
I populated it as follows:
It's always worth checking out for edge cases - single records at the beginning/end of your dataset + groups that you want to capture at the beginning and end also! I leave it to you to do more exhaustive testing!
I'll give the results first:
There's a bonus - you've got the starts and ends of the multiple close-together sessions thrown in for free!
I do have one word of advice - you really shouldn't be using or gathering data as comma-separated lists - SQL wasn't designed for string manipulation and extracting meaningful information and knowledge from such lists is painful - better to have a single atomic datum in a single field - see
I've left in the various "sub-fiddles" that I used to arrive at the final result - hopefully they'll help you to learn about window functions &c... My own preference for the results would be in this format (see the fiddle - with one record/session - you can prune as you see fit):
```
sid created_date st min_cd max_cd f_ts l_ts c_ts_asc c_ts_desc o_url c_sess_id
1 2021-01-18 10:30:24.00 1 2021-01-18 10:30:24.00 2021-01-18 10:30:24.00 1 1 1 1 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2 2021-01-18 11:02:24.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 1 4 1 4 https://zes.com/ znkjoc
3 2021-01-18 11:02:35.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 2 3 2 3 https://zes.com/ znkjoc
4 2021-01-18 11:03:03.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 3 2 3 2 https://zes.com/ znkjoc
5 2021-01-18 11:04:15.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 4 1 4 1 https://zes.com/ znkjoc
6 2021-01-18 11:49:28.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 1 3 1 3 https://rararar.com/ znkjoc3gfth2c3m0t1klii
7 2021-01-18 11:49:48.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 2 2 2 2 https://rararar.com/ znkjoc3gfth2c3m0t1klii
8 2021-01-18 11:50:13.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 3 1 3 1
All I will say is that it gives some idea of the power of window functions.
I noticed in the comments that there was some debate about what constituted a group - in this example, I have constructed the SQL such that it picks out as a group where all subsequent sessions started within 180 seconds (i.e. 3 mins) of the first - you can change the 180 to 60 (or whatever) yourself.
I've added in some records for the purpose of testing and also added
CONSTRAINTs to the table definition. It's always best to put as much as possible into the DDL - your database is your final bastion of defence for your data!CREATE TABLE test
(
id bigint NOT NULL,
created_date TIMESTAMP(2) NOT NULL,
origin_url VARCHAR (200) NOT NULL,
client_session_id VARCHAR (50) NOT NULL,
CONSTRAINT test_id_pk PRIMARY KEY (id),
CONSTRAINT test_cd_url_sess_id_uq UNIQUE (created_date, origin_url, client_session_id)
);Always use named constraints - these provide more meaningful messages than:
... failed... CONSTRAINT xyz_000_43abc has been violated....I populated it as follows:
INSERT INTO test VALUES
--
-- 1 lone record...
--
(1,'2021-01-18 10:30:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX
--
-- 4 records within 180 seconds of the first one
--
(2,'2021-01-18 11:02:24.0', 'https://zes.com/', 'znkjoc'),
(3,'2021-01-18 11:02:35.0', 'https://zes.com/', 'znkjoc'),
(4,'2021-01-18 11:03:03.0', 'https://zes.com/', 'znkjoc'), -- **
(5,'2021-01-18 11:04:15.0', 'https://zes.com/', 'znkjoc'), -- YY
--
-- 3 records within 180s of the first one
--
(6,'2021-01-18 11:49:28.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'),
(7,'2021-01-18 11:49:48.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
(8,'2021-01-18 11:50:13.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
-- 1 lone record
(9,'2021-01-18 12:57:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX
(10,'2021-02-18 09:02:24.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(11,'2021-02-18 09:02:35.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(12,'2021-02-18 09:03:03.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(13,'2021-02-18 09:04:15.0', 'https://rar.com/', 'znkjoc3'); -- ZZ
-- -- XX - Added record > 1 minute from next or previous.
-- -- ** - Changed created_date to get groups within 180 seconds.
-- -- YY - Added record < 3 minutes from previous to give 4 records
-- -- ZZ - Added group of 4 records at the end.It's always worth checking out for edge cases - single records at the beginning/end of your dataset + groups that you want to capture at the beginning and end also! I leave it to you to do more exhaustive testing!
I'll give the results first:
rn st sids URL: Session id: Session start time: Session end time:
1 2 2,3,4,5 https://zes.com/ znkjoc 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00
2 5 10,11,12,13 https://rar.com/ znkjoc3 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00There's a bonus - you've got the starts and ends of the multiple close-together sessions thrown in for free!
I do have one word of advice - you really shouldn't be using or gathering data as comma-separated lists - SQL wasn't designed for string manipulation and extracting meaningful information and knowledge from such lists is painful - better to have a single atomic datum in a single field - see
1st Normal Form!I've left in the various "sub-fiddles" that I used to arrive at the final result - hopefully they'll help you to learn about window functions &c... My own preference for the results would be in this format (see the fiddle - with one record/session - you can prune as you see fit):
```
sid created_date st min_cd max_cd f_ts l_ts c_ts_asc c_ts_desc o_url c_sess_id
1 2021-01-18 10:30:24.00 1 2021-01-18 10:30:24.00 2021-01-18 10:30:24.00 1 1 1 1 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2 2021-01-18 11:02:24.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 1 4 1 4 https://zes.com/ znkjoc
3 2021-01-18 11:02:35.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 2 3 2 3 https://zes.com/ znkjoc
4 2021-01-18 11:03:03.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 3 2 3 2 https://zes.com/ znkjoc
5 2021-01-18 11:04:15.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 4 1 4 1 https://zes.com/ znkjoc
6 2021-01-18 11:49:28.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 1 3 1 3 https://rararar.com/ znkjoc3gfth2c3m0t1klii
7 2021-01-18 11:49:48.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 2 2 2 2 https://rararar.com/ znkjoc3gfth2c3m0t1klii
8 2021-01-18 11:50:13.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 3 1 3 1
Code Snippets
CREATE TABLE test
(
id bigint NOT NULL,
created_date TIMESTAMP(2) NOT NULL,
origin_url VARCHAR (200) NOT NULL,
client_session_id VARCHAR (50) NOT NULL,
CONSTRAINT test_id_pk PRIMARY KEY (id),
CONSTRAINT test_cd_url_sess_id_uq UNIQUE (created_date, origin_url, client_session_id)
);INSERT INTO test VALUES
--
-- 1 lone record...
--
(1,'2021-01-18 10:30:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX
--
-- 4 records within 180 seconds of the first one
--
(2,'2021-01-18 11:02:24.0', 'https://zes.com/', 'znkjoc'),
(3,'2021-01-18 11:02:35.0', 'https://zes.com/', 'znkjoc'),
(4,'2021-01-18 11:03:03.0', 'https://zes.com/', 'znkjoc'), -- **
(5,'2021-01-18 11:04:15.0', 'https://zes.com/', 'znkjoc'), -- YY
--
-- 3 records within 180s of the first one
--
(6,'2021-01-18 11:49:28.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'),
(7,'2021-01-18 11:49:48.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
(8,'2021-01-18 11:50:13.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
-- 1 lone record
(9,'2021-01-18 12:57:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX
(10,'2021-02-18 09:02:24.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(11,'2021-02-18 09:02:35.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(12,'2021-02-18 09:03:03.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(13,'2021-02-18 09:04:15.0', 'https://rar.com/', 'znkjoc3'); -- ZZ
-- -- XX - Added record > 1 minute from next or previous.
-- -- ** - Changed created_date to get groups within 180 seconds.
-- -- YY - Added record < 3 minutes from previous to give 4 records
-- -- ZZ - Added group of 4 records at the end.rn st sids URL: Session id: Session start time: Session end time:
1 2 2,3,4,5 https://zes.com/ znkjoc 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00
2 5 10,11,12,13 https://rar.com/ znkjoc3 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00sid created_date st min_cd max_cd f_ts l_ts c_ts_asc c_ts_desc o_url c_sess_id
1 2021-01-18 10:30:24.00 1 2021-01-18 10:30:24.00 2021-01-18 10:30:24.00 1 1 1 1 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2 2021-01-18 11:02:24.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 1 4 1 4 https://zes.com/ znkjoc
3 2021-01-18 11:02:35.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 2 3 2 3 https://zes.com/ znkjoc
4 2021-01-18 11:03:03.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 3 2 3 2 https://zes.com/ znkjoc
5 2021-01-18 11:04:15.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 4 1 4 1 https://zes.com/ znkjoc
6 2021-01-18 11:49:28.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 1 3 1 3 https://rararar.com/ znkjoc3gfth2c3m0t1klii
7 2021-01-18 11:49:48.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 2 2 2 2 https://rararar.com/ znkjoc3gfth2c3m0t1klii
8 2021-01-18 11:50:13.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 3 1 3 1 https://rararar.com/ znkjoc3gfth2c3m0t1klii
9 2021-01-18 12:57:24.00 4 2021-01-18 12:57:24.00 2021-01-18 12:57:24.00 1 1 1 1 https://zendes.com/ znkjoc3gfth2c3m0t1klii
10 2021-02-18 09:02:24.00 5 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00 1 4 1 4 https://rar.com/ znkjoc3
11 2021-02-18 09:02:35.00 5 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00 2 3 2 3 https://rar.com/ znkjoc3
12 2021-02-18 09:03:03.00 5 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00 3 2 3 2 https://rar.com/ znkjoc3
13 2021-02-18 09:04:15.00 5 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00 4 1 4 1 https://rar.com/ znkjoc3sid Session no. Start of sessions End of sessions Session count o_url c_sess_id
2 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 4 https://zes.com/ znkjoc
10 5 2021-02-18 09:02:24.00 2021-02-18 09:04:15.00 4 https://rar.com/ znkjoc3Context
StackExchange Database Administrators Q#283859, answer score: 3
Revisions (0)
No revisions yet.