HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Long SQL query with much duplication between two halves of a UNION

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
halvesmuchsqlwithqueryduplicationlonguniontwobetween

Problem

This works, but it's huge.

We need to repeat each table column on each UNION SELECT, as well as the WHERE clause for DATE comparison.

( 
SELECT GROUP_CONCAT(APA_T.district), t.name 
FROM tbl_activity AS t 
INNER JOIN tbl_activity_package AS ap ON t.id = ap.activity_id 
INNER JOIN ( 
SELECT DISTINCT apa.district AS district, ( 
SELECT s1.activity_package_id 
FROM tbl_activity_package_address s1 
WHERE apa.district = s1.district 
ORDER BY s1.id DESC 
LIMIT 1 
) AS idActivityPackage 
FROM 
tbl_activity_package_address apa 
ORDER BY apa.district 
) AS APA_T 
ON ap.id = APA_T.idActivityPackage 
WHERE (ap.publication_date = CURDATE()) 
GROUP BY t.name 
ORDER BY APA_T.district 
) 
UNION DISTINCT 
( 
SELECT GROUP_CONCAT(DISTINCT apa2.district ORDER BY apa2.district), t2.name 
FROM tbl_activity AS t2 
INNER JOIN tbl_activity_package AS ap2 ON t2.id = ap2.activity_id 
INNER JOIN tbl_activity_package_address AS apa2 ON ap2.id = apa2.activity_package_id 
WHERE (ap2.publication_date = CURDATE()) 
GROUP BY t2.name 
ORDER BY apa2.district 
) 
LIMIT 6, 6


I have removed the WHERE CLAUSE for DATE comparisons, because I can't have it on this fiddle structure, but, the repetition is still an issue. I've been told about INTERSECT but found nothing about that on MySQL.

http://sqlfiddle.com/#!2/37dd94/51

Solution

Your SQL Fiddle and the question have inconsistent column names: the fiddle has id_activity_package, while the code in this question has activity_package_id. I'll consider the fiddle as supplementary information, and treat the code in the question as authoritative.

The two halves of the UNION are nearly identical, with the exception of the second INNER JOIN. Let's analyze it one step at a time.

-
In the first half of the UNION, you join with APA_T, which is…

SELECT DISTINCT apa.district AS district,
    (SELECT s1.activity_package_id
        FROM tbl_activity_package_address s1
        WHERE apa.district = s1.district
        ORDER BY s1.id DESC
        LIMIT 1
    ) AS idActivityPackage
    FROM tbl_activity_package_address apa 
    ORDER BY apa.district


The DISTINCT there is superfluous, since the innermost SELECT is guaranteed to return at most one row per district.

As I read it, APA_T is a subquery that lists the most recently added activity_package_id for each district. I think it's worth creating a view for clarity.

CREATE VIEW latest_activity_package_address AS
SELECT *
    FROM tbl_activity_package_address
    WHERE id IN (
        SELECT MAX(id)
            FROM tbl_activity_package_address AS latest_apa
            GROUP BY district
    );


Then, the first half of the UNION becomes:

SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN latest_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date = CURDATE())
    GROUP BY t.name
    ORDER BY apa.district


I've copied the ORDER BY clause into the GROUP_CONCAT(), where I think it belongs.

-
Next, I'd like to point out that LIMIT 6,6 will produce non-deterministic results, since there is no ORDER BY clause on the UNION query. You've put an ORDER BY on each half of the UNION, but that doesn't guarantee that the result will be ordered the same way.

-
Surprisingly, I haven't found a way to reduce duplication between the two halves of the UNION. However, there is now a nice parallelism which may be more aesthetically pleasing. You don't need to select new table aliases for the second half; they aren't in the same scope as the first half.

SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN latest_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date = CURDATE())
    GROUP BY t.name
    ORDER BY apa.district
UNION DISTINCT
SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN tbl_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date = CURDATE())
    GROUP BY t.name
    ORDER BY apa.district

Code Snippets

SELECT DISTINCT apa.district AS district,
    (SELECT s1.activity_package_id
        FROM tbl_activity_package_address s1
        WHERE apa.district = s1.district
        ORDER BY s1.id DESC
        LIMIT 1
    ) AS idActivityPackage
    FROM tbl_activity_package_address apa 
    ORDER BY apa.district
CREATE VIEW latest_activity_package_address AS
SELECT *
    FROM tbl_activity_package_address
    WHERE id IN (
        SELECT MAX(id)
            FROM tbl_activity_package_address AS latest_apa
            GROUP BY district
    );
SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN latest_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date <= CURDATE())
        AND (DATE_ADD(ap.publication_date, INTERVAL ap.publication_duration_in_days DAY) >= CURDATE())
    GROUP BY t.name
    ORDER BY apa.district
SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN latest_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date <= CURDATE())
        AND (DATE_ADD(ap.publication_date, INTERVAL ap.publication_duration_in_days DAY) >= CURDATE())
    GROUP BY t.name
    ORDER BY apa.district
UNION DISTINCT
SELECT GROUP_CONCAT(district ORDER BY district), t.name
    FROM tbl_activity AS t
        INNER JOIN tbl_activity_package AS ap
            ON t.id = ap.activity_id
        INNER JOIN tbl_activity_package_address AS apa
            ON ap.id = apa.activity_package_id
    WHERE (ap.publication_date <= CURDATE())
        AND (DATE_ADD(ap.publication_date, INTERVAL ap.publication_duration_in_days DAY) >= CURDATE())
    GROUP BY t.name
    ORDER BY apa.district

Context

StackExchange Code Review Q#49243, answer score: 4

Revisions (0)

No revisions yet.