patternsqlMinor
Efficient way to handle multiple CASE statements in SELECT
Viewed 0 times
casehandlestatementsefficientwaymultipleselect
Problem
I run a report in which I have a situation where based on a column value which is basically a Key or Id, I need to fetch corresponding value from the mapping Id table. Something like below:
More precisely:
In Tables B,C,D..so on, we maintain data for all clients. Each client has a specific
Is there an efficient way to handle this in SQL Server?
SELECT
(case when [column1='A'] then (select value from Table1)
when [column1='B'] then (select value from Table2)
when [column1='C'] then (select value from Table3)
and so on uptil 35 more 'when' conditions ...
ELSE column1 end) Value
from Table1More precisely:
SELECT
(case when [A.column1='1']
then (select value from B where B.clientId=100 and A.column1=B.Id)
when [A.column1='2']
then (select value from C where C.clientId=100 and A.column1=C.Id)
when [A.column1='3']
then (select value from D where D.clientId=100 and A.column1=D.Id)
...
and so on uptil 30 more 'when' conditions
...
ELSE column1 end)
FROM AIn Tables B,C,D..so on, we maintain data for all clients. Each client has a specific
ClientId and these tables B,C,D etc have index in place on Id and ClientId columns.Is there an efficient way to handle this in SQL Server?
Solution
I'm assuming that you have appropriate indexes on the tables in the subqueries. I mocked up some quick test data and put 10 million rows in table
I disabled result sets and ran the following query in SSMS:
It took around 0.723 seconds. I'm doing pretty unscientific tests because I don't know anything your data. In any case, with serial queries we can't expect a better result than 0.7 seconds. That's our baseline.
The most efficient way to write this query is without joins at all. The key is that the
The plan is very simple:
For another option you can write the query with joins (where we can rewrite the
Here's the plan:
You can get a nearly identical runtime and query plan by writing the query like this:
The original query in the question has an issue: SQL Server is doing a useless sort before the nested loop join. That query finishes in around 5.838 seconds on my machine.
Trace flag 8690 eliminates the sort as well as the table spools. The query runs in about 7.479 seconds with trace flag 8690, so I suppose that the spools are helpful for this query.
A. I wasn't game to create 30 tables so I just created 3 for the CASE expression. I think that 3 is enough to show the general principles.DROP TABLE IF EXISTS dbo.B;
CREATE TABLE dbo.B (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO B VALUES (100, '1', 'TABLE B');
DROP TABLE IF EXISTS dbo.C;
CREATE TABLE dbo.C (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO C VALUES (100, '2', 'TABLE C');
DROP TABLE IF EXISTS dbo.D;
CREATE TABLE dbo.D (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO D VALUES (100, '3', 'TABLE D');
DROP TABLE IF EXISTS dbo.A;
CREATE TABLE dbo.A (
column1 VARCHAR(20) NOT NULL
);
INSERT INTO dbo.A WITH (TABLOCK)
SELECT CAST(1 + t.RN % 3 AS VARCHAR(20))
FROM
(
SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;I disabled result sets and ran the following query in SSMS:
SELECT A.column1
FROM A;It took around 0.723 seconds. I'm doing pretty unscientific tests because I don't know anything your data. In any case, with serial queries we can't expect a better result than 0.7 seconds. That's our baseline.
The most efficient way to write this query is without joins at all. The key is that the
CASE expression is only ever going to return 3 (or 30) unique values if it finds a match. You can save off the results into local variables and just use those in the query. The query below finishes in around 1.044 seconds:DECLARE @B_VALUE VARCHAR(100) = (select value from B where B.clientId=100 and B.Id = '1');
DECLARE @C_VALUE VARCHAR(100) = (select value from C where C.clientId=100 and C.Id = '2');
DECLARE @D_VALUE VARCHAR(100) = (select value from D where D.clientId=100 and D.Id = '3');
SELECT
(case when A.column1='1' then @B_VALUE
when A.column1='2' then @C_VALUE
when A.column1='3' then @D_VALUE
-- omitted other columns
else column1 end)
FROM A;The plan is very simple:
For another option you can write the query with joins (where we can rewrite the
CASE expression in a more compact form, using COALESCE(). This finished in around 2.314 seconds:SELECT
COALESCE(B.column1, C.column1, D.column1, -- omitted other columns
A.column1)
-- (case A.column1
-- when '1' then B.value
-- when '2' then C.value
-- when '3' then D.value
-- -- omitted other columns
-- else A.column1 end)
FROM A
LEFT JOIN B ON B.clientId=100 and B.Id = '1'
LEFT JOIN C ON C.clientId=100 and C.Id = '2'
LEFT JOIN D ON D.clientId=100 and D.Id = '3';Here's the plan:
You can get a nearly identical runtime and query plan by writing the query like this:
SELECT
(case A.column1
when '1' then (select value from B where B.clientId=100 and '1'=B.Id)
when '2' then (select value from C where C.clientId=100 and '2'=C.Id)
when '3' then (select value from D where D.clientId=100 and '3'=D.Id)
-- omitted other columns
else column1 end)
FROM A;The original query in the question has an issue: SQL Server is doing a useless sort before the nested loop join. That query finishes in around 5.838 seconds on my machine.
Trace flag 8690 eliminates the sort as well as the table spools. The query runs in about 7.479 seconds with trace flag 8690, so I suppose that the spools are helpful for this query.
Code Snippets
DROP TABLE IF EXISTS dbo.B;
CREATE TABLE dbo.B (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO B VALUES (100, '1', 'TABLE B');
DROP TABLE IF EXISTS dbo.C;
CREATE TABLE dbo.C (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO C VALUES (100, '2', 'TABLE C');
DROP TABLE IF EXISTS dbo.D;
CREATE TABLE dbo.D (
ClientID INT NOT NULL,
Id VARCHAR(20) NOT NULL,
[Value] VARCHAR(100),
PRIMARY KEY (ClientID, Id)
);
INSERT INTO D VALUES (100, '3', 'TABLE D');
DROP TABLE IF EXISTS dbo.A;
CREATE TABLE dbo.A (
column1 VARCHAR(20) NOT NULL
);
INSERT INTO dbo.A WITH (TABLOCK)
SELECT CAST(1 + t.RN % 3 AS VARCHAR(20))
FROM
(
SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;SELECT A.column1
FROM A;DECLARE @B_VALUE VARCHAR(100) = (select value from B where B.clientId=100 and B.Id = '1');
DECLARE @C_VALUE VARCHAR(100) = (select value from C where C.clientId=100 and C.Id = '2');
DECLARE @D_VALUE VARCHAR(100) = (select value from D where D.clientId=100 and D.Id = '3');
SELECT
(case when A.column1='1' then @B_VALUE
when A.column1='2' then @C_VALUE
when A.column1='3' then @D_VALUE
-- omitted other columns
else column1 end)
FROM A;SELECT
COALESCE(B.column1, C.column1, D.column1, -- omitted other columns
A.column1)
-- (case A.column1
-- when '1' then B.value
-- when '2' then C.value
-- when '3' then D.value
-- -- omitted other columns
-- else A.column1 end)
FROM A
LEFT JOIN B ON B.clientId=100 and B.Id = '1'
LEFT JOIN C ON C.clientId=100 and C.Id = '2'
LEFT JOIN D ON D.clientId=100 and D.Id = '3';SELECT
(case A.column1
when '1' then (select value from B where B.clientId=100 and '1'=B.Id)
when '2' then (select value from C where C.clientId=100 and '2'=C.Id)
when '3' then (select value from D where D.clientId=100 and '3'=D.Id)
-- omitted other columns
else column1 end)
FROM A;Context
StackExchange Database Administrators Q#188881, answer score: 7
Revisions (0)
No revisions yet.