HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

MySQL - maximum of sum over different months with ties over multiple years

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
maximumwithtiesdifferentmysqlmonthsmultiplesumyearsover

Problem

This question was inspired by this one [closed] and is virtually identical to this one but using different RDBMS's (PostgreSQL vs. MySQL).

Suppose I have a list of tumours (this data is simulated from real data):

CREATE table illness (nature_of_illness VARCHAR(25), created_at DATETIME);

INSERT INTO illness VALUES ('Cervix', '2018-01-03 15:45:40');
INSERT INTO illness VALUES ('Cervix', '2018-01-03 15:45:40');
INSERT INTO illness VALUES ('Cervix', '2018-01-03 15:45:40');
INSERT INTO illness VALUES ('Cervix', '2018-01-03 15:45:40');
INSERT INTO illness VALUES ('Cervix', '2018-01-03 15:45:40');
INSERT INTO illness VALUES ('Lung',   '2018-01-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2018-02-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2018-02-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2018-02-03 17:50:32');
INSERT INTO illness VALUES ('Cervix', '2018-02-03 17:50:32');
-- 2017, with 1 Cervix and Lung each for the month of Jan - tie!
INSERT INTO illness VALUES ('Cervix', '2017-01-03 15:45:40');
INSERT INTO illness VALUES ('Lung',   '2017-01-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2017-02-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2017-02-03 17:50:32');
INSERT INTO illness VALUES ('Lung',   '2017-02-03 17:50:32');
INSERT INTO illness VALUES ('Cervix', '2017-02-03 17:50:32');


You want to find out which particular tumour was most common in a given month - so far so good!

Now, you will notice that for month 1 of 2017, there is a tie - so it makes no sense whatsoever to randomly pick one and give that as the answer - so ties have to be included - this makes the problem much more challenging.

The correct answer is:

Year    Month  Tumour count      Type
  2017        1             1    Cervix  -- note tie
  2017        1             1      Lung  --   "   "
  2017        2             3      Lung
  2018        1             5    Cervix
  2018        2             3      Lung


A further bonus would b

Solution

My attempt to solve this is as follows. I would appreciate any advice on how this query could be improved:

SELECT 
  t3.c_year AS "Year",
  t3.c_month AS "Month", 
  t3.il_mc AS  "Tumour count", 
  t4.ill_nat AS "Type" FROM
(
  SELECT c_year, c_month, il_mc FROM
  (
    SELECT  
    c_year, 
    c_month,
    MAX(month_count) AS il_mc
  FROM
    (
      SELECT nature_of_illness as illness,
        EXTRACT(YEAR  FROM created_at) AS c_year,
        EXTRACT(MONTH FROM created_at) AS c_month,
        COUNT(EXTRACT(MONTH FROM created_at)) AS month_count
      FROM illness
      GROUP BY illness, c_year, c_month
      ORDER BY c_year, c_month
    ) AS t1
  GROUP BY c_year, c_month
  ) AS t2
) AS t3
JOIN
(
SELECT 
  EXTRACT(YEAR FROM created_at) AS t_year, 
  EXTRACT(MONTH FROM created_at) AS t_month,  
  nature_of_illness AS ill_nat, 
  COUNT(nature_of_illness) AS ill_cnt
FROM illness
GROUP BY t_year, t_month, nature_of_illness
ORDER BY t_year, t_month, nature_of_illness
) AS t4
ON t3.c_year = t4.t_year
AND t3.c_month = t4.t_month
AND t3.il_mc = t4.ill_cnt


And it does give the correct result, as can be seen in the fiddle here!

Code Snippets

SELECT 
  t3.c_year AS "Year",
  t3.c_month AS "Month", 
  t3.il_mc AS  "Tumour count", 
  t4.ill_nat AS "Type" FROM
(
  SELECT c_year, c_month, il_mc FROM
  (
    SELECT  
    c_year, 
    c_month,
    MAX(month_count) AS il_mc
  FROM
    (
      SELECT nature_of_illness as illness,
        EXTRACT(YEAR  FROM created_at) AS c_year,
        EXTRACT(MONTH FROM created_at) AS c_month,
        COUNT(EXTRACT(MONTH FROM created_at)) AS month_count
      FROM illness
      GROUP BY illness, c_year, c_month
      ORDER BY c_year, c_month
    ) AS t1
  GROUP BY c_year, c_month
  ) AS t2
) AS t3
JOIN
(
SELECT 
  EXTRACT(YEAR FROM created_at) AS t_year, 
  EXTRACT(MONTH FROM created_at) AS t_month,  
  nature_of_illness AS ill_nat, 
  COUNT(nature_of_illness) AS ill_cnt
FROM illness
GROUP BY t_year, t_month, nature_of_illness
ORDER BY t_year, t_month, nature_of_illness
) AS t4
ON t3.c_year = t4.t_year
AND t3.c_month = t4.t_month
AND t3.il_mc = t4.ill_cnt

Context

StackExchange Database Administrators Q#206030, answer score: 5

Revisions (0)

No revisions yet.