patternsqlMinor
For a query over multiple databases, is it faster to do GROUP BY first or UNION first?
Viewed 0 times
databasesgroupqueryunionfasterfirstformultipleover
Problem
Having several databases with the same table, I need to group the union of the records.
The
I have tried two approaches but I do not know which is better. I need to calculate the MIN and SUM of the grouped values. The first approach calculates the values for the years (databases) first, and then calculates it again from the agregated values.
The second approach unions all the record first and groups the flatted records later:
The execution plans differ and they show only percentage of the calculation time. I am not that good to read them. By intuition, the first approach may take advantage of statistic
The
xTables are stored in separate databases named by the year (this is by third-party design and I cannot change it). There are 3 databases in my case but it could be say up to 10. Each of the tables contains about 2500 records with about 300 distinct xID values.I have tried two approaches but I do not know which is better. I need to calculate the MIN and SUM of the grouped values. The first approach calculates the values for the years (databases) first, and then calculates it again from the agregated values.
CREATE VIEW dbo.view1 AS
SELECT
id,
MIN(minA) AS a,
CONVERT(numeric(19, 2), SUM(sumB)) AS b
FROM (
SELECT
xID AS id,
MIN(xA) AS minA,
SUM(xB) AS sumB
FROM x2014.dbo.xTable
GROUP BY xID
UNION
SELECT
xID AS id,
MIN(xA) AS minA,
SUM(xB) AS sumB
FROM x2013.dbo.xTable
GROUP BY xID
UNION
SELECT
xID AS id,
MIN(xA) AS minA,
SUM(xB) AS sumB
FROM x2012.dbo.xTable
GROUP BY xID
) AS u
GROUP BY idThe second approach unions all the record first and groups the flatted records later:
CREATE VIEW dbo.view2 AS
SELECT
xID AS id,
MIN(xA) AS a,
CONVERT(numeric(19, 2), SUM(xB)) AS b
FROM (
SELECT
xID,
xA,
xB
FROM x2014.dbo.xTable
UNION
SELECT
xID,
xA,
xB
FROM x2013.dbo.xTable
UNION
SELECT
xID,
xA,
xB
FROM x2012.dbo.xTable
) AS u
GROUP BY idThe execution plans differ and they show only percentage of the calculation time. I am not that good to read them. By intuition, the first approach may take advantage of statistic
Solution
The Second is Faster
Having just tested a slightly simpler version of this, using only 1 column to be aggregated, I found that the second version is faster. I would also argue that the second is much easier to read.
Here is the code I used to test it:
General Feedback
You've consistently capitalised your keywords and your indentation is also consistent. Both of these help readability.
I recommend using the second query, it doesn't require any modification as far as I can see. You've got a pretty nice query there.
Having just tested a slightly simpler version of this, using only 1 column to be aggregated, I found that the second version is faster. I would also argue that the second is much easier to read.
Here is the code I used to test it:
CREATE TABLE TEST1 (id INT, a INT)
INSERT INTO TEST1
SELECT 1,23 UNION ALL
SELECT 1,26 UNION ALL
SELECT 1,47 UNION ALL
SELECT 2,13 UNION ALL
SELECT 2,31
CREATE TABLE TEST2 (id INT, a INT)
INSERT INTO TEST2
SELECT 1,18 UNION ALL
SELECT 1,45 UNION ALL
SELECT 2,24 UNION ALL
SELECT 2,42 UNION ALL
SELECT 2,64
/*
Single Group
*/
SELECT
id,
MIN(a) AS minA
FROM
(
SELECT id,a
FROM TEST1
UNION ALL
SELECT id,a
FROM TEST2
)Data
GROUP BY id
/*
Multiple Groups
*/
SELECT
id,
MIN(a) AS minA
FROM
(
SELECT id,MIN(a) AS a
FROM TEST1
GROUP BY id
UNION ALL
SELECT id,MIN(a) AS a
FROM TEST2
GROUP BY id
)Data
GROUP BY id
DROP TABLE TEST1
DROP TABLE TEST2General Feedback
You've consistently capitalised your keywords and your indentation is also consistent. Both of these help readability.
I recommend using the second query, it doesn't require any modification as far as I can see. You've got a pretty nice query there.
Code Snippets
CREATE TABLE TEST1 (id INT, a INT)
INSERT INTO TEST1
SELECT 1,23 UNION ALL
SELECT 1,26 UNION ALL
SELECT 1,47 UNION ALL
SELECT 2,13 UNION ALL
SELECT 2,31
CREATE TABLE TEST2 (id INT, a INT)
INSERT INTO TEST2
SELECT 1,18 UNION ALL
SELECT 1,45 UNION ALL
SELECT 2,24 UNION ALL
SELECT 2,42 UNION ALL
SELECT 2,64
/*
Single Group
*/
SELECT
id,
MIN(a) AS minA
FROM
(
SELECT id,a
FROM TEST1
UNION ALL
SELECT id,a
FROM TEST2
)Data
GROUP BY id
/*
Multiple Groups
*/
SELECT
id,
MIN(a) AS minA
FROM
(
SELECT id,MIN(a) AS a
FROM TEST1
GROUP BY id
UNION ALL
SELECT id,MIN(a) AS a
FROM TEST2
GROUP BY id
)Data
GROUP BY id
DROP TABLE TEST1
DROP TABLE TEST2Context
StackExchange Code Review Q#68357, answer score: 6
Revisions (0)
No revisions yet.