HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Mysql get records more then 3 in interval of 1 minute, return group_concat ID

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
returnrecordsmoreintervalmysqlgetthengroup_concatminute

Problem

Currently i have this dataset, i need to return grouped ids that are within the range of 60 seconds and have more than 3.

CREATE TABLE test 
(
  `id` bigint NOT NULL AUTO_INCREMENT,
  created_date TIMESTAMP(1) NOT NULL,
  origin_url   VARCHAR (200) NOT NULL,
  client_session_id VARCHAR (50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UC_PRE_CHAT_TALKID_COL` (`id`)
);




INSERT INTO test VALUES
(1,'2021-01-18 11:02:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(2,'2021-01-18 11:02:35.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(3,'2021-01-18 11:02:03.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
(4,'2021-01-18 11:11:28.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii'),
(5,'2021-01-18 11:11:36.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii'),
(6,'2021-01-18 11:11:05.0', 'https://rarara.com/', 'znkjoc3gfth2c3m0t1klii');




db<>fiddle here

something like this:

ids     origin_url              client_session_id
1,2,3   https://testett.com/    znkjoc3gfth2c3m0t1klii
4,5,6   https://rarara.com/     znkjoc3gfth2c3m0t1klii


Edit some context:

I currently developed a cron that runs every 1 minute to analyze the last 60 seconds of the bot records in database, I need to group the conversation ID's that have more than 3 records within 60 seconds in the same url and client_session_id.

follows the SQL I'm running:

select
    count(session_id),
    client_session_id,
    GROUP_CONCAT(id) as talkIds,
    origin_url 
from
    bot_talk
where
    created_date > now() - interval 60 second
group by
    client_session_id, origin_url 
having
    count(session_id) >= 3


This query works as I expect, but sometimes my cron service is sometimes down, and I lose those repeated records.

I thought about making an SQL(Cron) at the end of the day to analyze the last 24 hours, and look for the records that are repeated according to the rule I mentioned above?

Solution

Here is the answer - see the fiddle. Another answer I wrote to a similar question may provide some clearer background and is a bit simpler - see here.

All I will say is that it gives some idea of the power of window functions.

I noticed in the comments that there was some debate about what constituted a group - in this example, I have constructed the SQL such that it picks out as a group where all subsequent sessions started within 180 seconds (i.e. 3 mins) of the first - you can change the 180 to 60 (or whatever) yourself.

I've added in some records for the purpose of testing and also added CONSTRAINTs to the table definition. It's always best to put as much as possible into the DDL - your database is your final bastion of defence for your data!

CREATE TABLE test 
(
  id bigint NOT NULL,
  created_date TIMESTAMP(2) NOT NULL,
  origin_url   VARCHAR (200) NOT NULL,
  client_session_id VARCHAR (50) NOT NULL,
  CONSTRAINT test_id_pk PRIMARY KEY (id),
  CONSTRAINT test_cd_url_sess_id_uq UNIQUE (created_date, origin_url, client_session_id)
);


Always use named constraints - these provide more meaningful messages than: ... failed... CONSTRAINT xyz_000_43abc has been violated....

I populated it as follows:

INSERT INTO test VALUES

--
-- 1 lone record...
--

(1,'2021-01-18 10:30:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX

--
-- 4 records within 180 seconds of the first one
--

(2,'2021-01-18 11:02:24.0', 'https://zes.com/', 'znkjoc'),
(3,'2021-01-18 11:02:35.0', 'https://zes.com/', 'znkjoc'),
(4,'2021-01-18 11:03:03.0', 'https://zes.com/', 'znkjoc'),  -- **
(5,'2021-01-18 11:04:15.0', 'https://zes.com/', 'znkjoc'),  -- YY

-- 
-- 3 records within 180s of the first one
--

(6,'2021-01-18 11:49:28.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'),
(7,'2021-01-18 11:49:48.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
(8,'2021-01-18 11:50:13.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **

-- 1 lone record

(9,'2021-01-18 12:57:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),  -- XX

(10,'2021-02-18 09:02:24.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(11,'2021-02-18 09:02:35.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(12,'2021-02-18 09:03:03.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(13,'2021-02-18 09:04:15.0', 'https://rar.com/', 'znkjoc3'); -- ZZ

-- -- XX - Added record > 1 minute from next or previous.
-- -- ** - Changed created_date to get groups within 180 seconds.
-- -- YY - Added record < 3 minutes from previous to give 4 records
-- -- ZZ - Added group of 4 records at the end.


It's always worth checking out for edge cases - single records at the beginning/end of your dataset + groups that you want to capture at the beginning and end also! I leave it to you to do more exhaustive testing!

I'll give the results first:

rn  st  sids                URL:                Session id: Session start time:            Session end time:
1   2   2,3,4,5             https://zes.com/    znkjoc      2021-01-18 11:02:24.00  2021-01-18 11:04:15.00
2   5   10,11,12,13         https://rar.com/    znkjoc3     2021-02-18 09:02:24.00  2021-02-18 09:04:15.00


There's a bonus - you've got the starts and ends of the multiple close-together sessions thrown in for free!

I do have one word of advice - you really shouldn't be using or gathering data as comma-separated lists - SQL wasn't designed for string manipulation and extracting meaningful information and knowledge from such lists is painful - better to have a single atomic datum in a single field - see 1st Normal Form!

I've left in the various "sub-fiddles" that I used to arrive at the final result - hopefully they'll help you to learn about window functions &c... My own preference for the results would be in this format (see the fiddle - with one record/session - you can prune as you see fit):

```
sid created_date st min_cd max_cd f_ts l_ts c_ts_asc c_ts_desc o_url c_sess_id
1 2021-01-18 10:30:24.00 1 2021-01-18 10:30:24.00 2021-01-18 10:30:24.00 1 1 1 1 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2 2021-01-18 11:02:24.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 1 4 1 4 https://zes.com/ znkjoc
3 2021-01-18 11:02:35.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 2 3 2 3 https://zes.com/ znkjoc
4 2021-01-18 11:03:03.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 3 2 3 2 https://zes.com/ znkjoc
5 2021-01-18 11:04:15.00 2 2021-01-18 11:02:24.00 2021-01-18 11:04:15.00 4 1 4 1 https://zes.com/ znkjoc
6 2021-01-18 11:49:28.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 1 3 1 3 https://rararar.com/ znkjoc3gfth2c3m0t1klii
7 2021-01-18 11:49:48.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 2 2 2 2 https://rararar.com/ znkjoc3gfth2c3m0t1klii
8 2021-01-18 11:50:13.00 3 2021-01-18 11:49:28.00 2021-01-18 11:50:13.00 3 1 3 1

Code Snippets

CREATE TABLE test 
(
  id bigint NOT NULL,
  created_date TIMESTAMP(2) NOT NULL,
  origin_url   VARCHAR (200) NOT NULL,
  client_session_id VARCHAR (50) NOT NULL,
  CONSTRAINT test_id_pk PRIMARY KEY (id),
  CONSTRAINT test_cd_url_sess_id_uq UNIQUE (created_date, origin_url, client_session_id)
);
INSERT INTO test VALUES

--
-- 1 lone record...
--

(1,'2021-01-18 10:30:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'), -- XX

--
-- 4 records within 180 seconds of the first one
--

(2,'2021-01-18 11:02:24.0', 'https://zes.com/', 'znkjoc'),
(3,'2021-01-18 11:02:35.0', 'https://zes.com/', 'znkjoc'),
(4,'2021-01-18 11:03:03.0', 'https://zes.com/', 'znkjoc'),  -- **
(5,'2021-01-18 11:04:15.0', 'https://zes.com/', 'znkjoc'),  -- YY

-- 
-- 3 records within 180s of the first one
--

(6,'2021-01-18 11:49:28.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'),
(7,'2021-01-18 11:49:48.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **
(8,'2021-01-18 11:50:13.0', 'https://rararar.com/', 'znkjoc3gfth2c3m0t1klii'), -- **

-- 1 lone record

(9,'2021-01-18 12:57:24.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),  -- XX


(10,'2021-02-18 09:02:24.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(11,'2021-02-18 09:02:35.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(12,'2021-02-18 09:03:03.0', 'https://rar.com/', 'znkjoc3'), -- ZZ
(13,'2021-02-18 09:04:15.0', 'https://rar.com/', 'znkjoc3'); -- ZZ


-- -- XX - Added record > 1 minute from next or previous.
-- -- ** - Changed created_date to get groups within 180 seconds.
-- -- YY - Added record < 3 minutes from previous to give 4 records
-- -- ZZ - Added group of 4 records at the end.
rn  st  sids                URL:                Session id: Session start time:            Session end time:
1   2   2,3,4,5             https://zes.com/    znkjoc      2021-01-18 11:02:24.00  2021-01-18 11:04:15.00
2   5   10,11,12,13         https://rar.com/    znkjoc3     2021-02-18 09:02:24.00  2021-02-18 09:04:15.00
sid created_date    st  min_cd  max_cd  f_ts    l_ts    c_ts_asc    c_ts_desc   o_url   c_sess_id
1   2021-01-18 10:30:24.00  1   2021-01-18 10:30:24.00  2021-01-18 10:30:24.00  1   1   1   1   https://zendes.com/ znkjoc3gfth2c3m0t1klii
2   2021-01-18 11:02:24.00  2   2021-01-18 11:02:24.00  2021-01-18 11:04:15.00  1   4   1   4   https://zes.com/    znkjoc
3   2021-01-18 11:02:35.00  2   2021-01-18 11:02:24.00  2021-01-18 11:04:15.00  2   3   2   3   https://zes.com/    znkjoc
4   2021-01-18 11:03:03.00  2   2021-01-18 11:02:24.00  2021-01-18 11:04:15.00  3   2   3   2   https://zes.com/    znkjoc
5   2021-01-18 11:04:15.00  2   2021-01-18 11:02:24.00  2021-01-18 11:04:15.00  4   1   4   1   https://zes.com/    znkjoc
6   2021-01-18 11:49:28.00  3   2021-01-18 11:49:28.00  2021-01-18 11:50:13.00  1   3   1   3   https://rararar.com/    znkjoc3gfth2c3m0t1klii
7   2021-01-18 11:49:48.00  3   2021-01-18 11:49:28.00  2021-01-18 11:50:13.00  2   2   2   2   https://rararar.com/    znkjoc3gfth2c3m0t1klii
8   2021-01-18 11:50:13.00  3   2021-01-18 11:49:28.00  2021-01-18 11:50:13.00  3   1   3   1   https://rararar.com/    znkjoc3gfth2c3m0t1klii
9   2021-01-18 12:57:24.00  4   2021-01-18 12:57:24.00  2021-01-18 12:57:24.00  1   1   1   1   https://zendes.com/ znkjoc3gfth2c3m0t1klii
10  2021-02-18 09:02:24.00  5   2021-02-18 09:02:24.00  2021-02-18 09:04:15.00  1   4   1   4   https://rar.com/    znkjoc3
11  2021-02-18 09:02:35.00  5   2021-02-18 09:02:24.00  2021-02-18 09:04:15.00  2   3   2   3   https://rar.com/    znkjoc3
12  2021-02-18 09:03:03.00  5   2021-02-18 09:02:24.00  2021-02-18 09:04:15.00  3   2   3   2   https://rar.com/    znkjoc3
13  2021-02-18 09:04:15.00  5   2021-02-18 09:02:24.00  2021-02-18 09:04:15.00  4   1   4   1   https://rar.com/    znkjoc3
sid Session no. Start of sessions   End of sessions Session count   o_url   c_sess_id
2   2   2021-01-18 11:02:24.00  2021-01-18 11:04:15.00  4   https://zes.com/    znkjoc
10  5   2021-02-18 09:02:24.00  2021-02-18 09:04:15.00  4   https://rar.com/    znkjoc3

Context

StackExchange Database Administrators Q#283859, answer score: 3

Revisions (0)

No revisions yet.