patternsqlMinor
Select user activity over past months grouped by cohorts
Viewed 0 times
cohortsuserpastactivitymonthsgroupedselectover
Problem
In MySQL 5.7, I have a table
20,000 users have registered in June 2015, from these 10,000 have been active in July 2015 or later, 5,000 have been active in August 2015 or later, etc. Continue this output for all users registered in the past months until today.
I understand how I can select any of the activity months by hand, for example select all users that were active one calendar month after their registration:
This would give me the number of users who were active one month after their registration or later.
But in addition I would also like to select the number of users who were active 2 months after their registration, 3 months after their registration and so on up to the current month.
The results would be something like this (not necessarily look like this, but it should get the idea across):
How do I change the query to select all months of activity instead of a specific month?
user that contains the columns registered and lastLogin. My goal is to group the users by calendar months of registration and then select for each group how many of these users were still active after 1 month, after 2 months and so on.20,000 users have registered in June 2015, from these 10,000 have been active in July 2015 or later, 5,000 have been active in August 2015 or later, etc. Continue this output for all users registered in the past months until today.
I understand how I can select any of the activity months by hand, for example select all users that were active one calendar month after their registration:
SELECT year(registered), month(registered), count(id)
FROM user
WHERE lastLogin >= DATE_ADD(DATE_SUB(date(registered), INTERVAL DAYOFMONTH(registered) - 1 DAY), INTERVAL 1 MONTH)
GROUP BY year(registered), month(registered)This would give me the number of users who were active one month after their registration or later.
But in addition I would also like to select the number of users who were active 2 months after their registration, 3 months after their registration and so on up to the current month.
The results would be something like this (not necessarily look like this, but it should get the idea across):
Columns: Cohort | total | 1 month | 2months | 3months | ... | month n
07/2015 | 20,000 | 10,100 | 5,500 | 2,600 | ... | 150
08/2015 | 21,000 | 9,800 | 4,300 | 1,300 | ... | 180
09/2015 | 19,700 | 11,400 | 6,200 | 3,500 | ... | 200
...
08/2017 | 25,300 | 13,000
09/2017 | 12,500How do I change the query to select all months of activity instead of a specific month?
Solution
With the help of Rick to look for pivot table I was able to create the final query as described in this post: https://stackoverflow.com/questions/7674786/mysql-pivot-table
Query for a 6 months analysis:
The part
Here is some exemplary output of the query:
Query for a 6 months analysis:
SELECT
YEAR(registered), MONTH(registered), COUNT(*) as total,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 1 MONTH) AS month1,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 2 MONTH) AS month2,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 3 MONTH) AS month3,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 4 MONTH) AS month4,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 5 MONTH) AS month5,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 6 MONTH) AS month6
FROM
user
WHERE
registered >= "2015-01-01"
GROUP BY YEAR(registered) , MONTH(registered)The part
DATE_SUB(DATE(registered), INTERVAL DAYOFMONTH(registered) - 1 DAY) will return the first of a month to make sure the analysis is based on calendar months. I am not sure if it could be simplified. I was looking at STR_TO_DATE(LEFT(registered, 7), "%y-%m") but I was reading working on Strings is slower and should be avoided. If someone knows a better / cleaner way of doing that, please comment and I will include it in the answer.Here is some exemplary output of the query:
year, month, total , month1, month2, month3, month4, month5, month6
'2015', '1', '14776', '4302', '3225', '2827', '2547', '2330', '2163'
'2015', '2', '12162', '3859', '2866', '2465', '2259', '2046', '1890'
'2015', '3', '10770', '3831', '2841', '2507', '2288', '2135', '1987'
'2015', '4', '15685', '3731', '2641', '2273', '2045', '1872', '1730'
'2015', '5', '18130', '3686', '2403', '1993', '1744', '1528', '1363'Code Snippets
SELECT
YEAR(registered), MONTH(registered), COUNT(*) as total,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 1 MONTH) AS month1,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 2 MONTH) AS month2,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 3 MONTH) AS month3,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 4 MONTH) AS month4,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 5 MONTH) AS month5,
SUM(lastLogin >= DATE(registered) - INTERVAL DAYOFMONTH(registered) - 1 DAY + INTERVAL 6 MONTH) AS month6
FROM
user
WHERE
registered >= "2015-01-01"
GROUP BY YEAR(registered) , MONTH(registered)year, month, total , month1, month2, month3, month4, month5, month6
'2015', '1', '14776', '4302', '3225', '2827', '2547', '2330', '2163'
'2015', '2', '12162', '3859', '2866', '2465', '2259', '2046', '1890'
'2015', '3', '10770', '3831', '2841', '2507', '2288', '2135', '1987'
'2015', '4', '15685', '3731', '2641', '2273', '2045', '1872', '1730'
'2015', '5', '18130', '3686', '2403', '1993', '1744', '1528', '1363'Context
StackExchange Database Administrators Q#186283, answer score: 3
Revisions (0)
No revisions yet.