HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

For a query over multiple databases, is it faster to do GROUP BY first or UNION first?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
databasesgroupqueryunionfasterfirstformultipleover

Problem

Having several databases with the same table, I need to group the union of the records.

The xTables are stored in separate databases named by the year (this is by third-party design and I cannot change it). There are 3 databases in my case but it could be say up to 10. Each of the tables contains about 2500 records with about 300 distinct xID values.

I have tried two approaches but I do not know which is better. I need to calculate the MIN and SUM of the grouped values. The first approach calculates the values for the years (databases) first, and then calculates it again from the agregated values.

CREATE VIEW dbo.view1 AS
    SELECT
        id,
        MIN(minA) AS a,
        CONVERT(numeric(19, 2), SUM(sumB)) AS b
    FROM (
        SELECT
            xID AS id,
            MIN(xA) AS minA,
            SUM(xB) AS sumB
        FROM x2014.dbo.xTable
        GROUP BY xID
        UNION 
        SELECT
            xID AS id,
            MIN(xA) AS minA,
            SUM(xB) AS sumB
        FROM x2013.dbo.xTable
        GROUP BY xID
        UNION 
        SELECT
            xID AS id,
            MIN(xA) AS minA,
            SUM(xB) AS sumB
        FROM x2012.dbo.xTable
        GROUP BY xID
        ) AS u
    GROUP BY id


The second approach unions all the record first and groups the flatted records later:

CREATE VIEW dbo.view2 AS
    SELECT
        xID AS id,
        MIN(xA) AS a,
        CONVERT(numeric(19, 2), SUM(xB)) AS b
    FROM (
        SELECT
            xID,
            xA,
            xB
        FROM x2014.dbo.xTable
        UNION 
        SELECT
            xID,
            xA,
            xB
        FROM x2013.dbo.xTable
        UNION 
        SELECT
            xID,
            xA,
            xB
        FROM x2012.dbo.xTable
        ) AS u
    GROUP BY id


The execution plans differ and they show only percentage of the calculation time. I am not that good to read them. By intuition, the first approach may take advantage of statistic

Solution

The Second is Faster

Having just tested a slightly simpler version of this, using only 1 column to be aggregated, I found that the second version is faster. I would also argue that the second is much easier to read.

Here is the code I used to test it:

CREATE TABLE TEST1 (id INT, a INT)

INSERT INTO TEST1
SELECT 1,23 UNION ALL
SELECT 1,26 UNION ALL
SELECT 1,47 UNION ALL
SELECT 2,13 UNION ALL
SELECT 2,31

CREATE TABLE TEST2 (id INT, a INT)

INSERT INTO TEST2
SELECT 1,18 UNION ALL
SELECT 1,45 UNION ALL
SELECT 2,24 UNION ALL
SELECT 2,42 UNION ALL
SELECT 2,64

/*
    Single Group
*/
SELECT 
    id,
    MIN(a) AS minA
FROM 
(
    SELECT id,a
    FROM TEST1

    UNION ALL

    SELECT id,a
    FROM TEST2
)Data
GROUP BY id

/*
    Multiple Groups
*/
SELECT 
    id,
    MIN(a) AS minA
FROM 
(
    SELECT id,MIN(a) AS a
    FROM TEST1
    GROUP BY id

    UNION ALL

    SELECT id,MIN(a) AS a
    FROM TEST2
    GROUP BY id
)Data
GROUP BY id

DROP TABLE TEST1
DROP TABLE TEST2


General Feedback

You've consistently capitalised your keywords and your indentation is also consistent. Both of these help readability.

I recommend using the second query, it doesn't require any modification as far as I can see. You've got a pretty nice query there.

Code Snippets

CREATE TABLE TEST1 (id INT, a INT)

INSERT INTO TEST1
SELECT 1,23 UNION ALL
SELECT 1,26 UNION ALL
SELECT 1,47 UNION ALL
SELECT 2,13 UNION ALL
SELECT 2,31

CREATE TABLE TEST2 (id INT, a INT)

INSERT INTO TEST2
SELECT 1,18 UNION ALL
SELECT 1,45 UNION ALL
SELECT 2,24 UNION ALL
SELECT 2,42 UNION ALL
SELECT 2,64

/*
    Single Group
*/
SELECT 
    id,
    MIN(a) AS minA
FROM 
(
    SELECT id,a
    FROM TEST1

    UNION ALL

    SELECT id,a
    FROM TEST2
)Data
GROUP BY id

/*
    Multiple Groups
*/
SELECT 
    id,
    MIN(a) AS minA
FROM 
(
    SELECT id,MIN(a) AS a
    FROM TEST1
    GROUP BY id

    UNION ALL

    SELECT id,MIN(a) AS a
    FROM TEST2
    GROUP BY id
)Data
GROUP BY id


DROP TABLE TEST1
DROP TABLE TEST2

Context

StackExchange Code Review Q#68357, answer score: 6

Revisions (0)

No revisions yet.