patternsqlMajor
Is "+" slower than "CONCAT" for large strings?
Viewed 0 times
thanslowerlargeforconcatstrings
Problem
I have always thought that CONCAT function is actually a wrapper over the + (String Concatenation) with some additional checks in order to make our life more easier.
I have not found any internal details about how the functions are implemented. As to the performance, it seems that there is overhead for calling
Few days ago, a dev modified some string concatenation code (from
In order to check the case, I have used the following code:
where this is variant one:
and this is variant two:
For smaller strings, there is no differences. At some point, the
I wonder if someone can share any internals or explains the behavior as it seems that there might be a rule that it's better to concatenate large strings using
Version:
Microsoft SQL Server 2022 (RTM-CU8) (KB5029666) - 16.0.4075.1 (X64)
Aug 23 2023 14:04:50
Copyright (C) 2022 Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)
The exact script looks like the following:
```
DECLARE @V1 NVARCHAR(MAX)
,@V2 NVARCHAR(MAX)
,@V3 NVARCHAR(MAX);
DECLARE @R NVARCHAR(MAX);
SELECT @V1 = REPLICATE(CAST('V1' AS NVARCHAR(MAX)),
I have not found any internal details about how the functions are implemented. As to the performance, it seems that there is overhead for calling
CONCAT when data is concatenating in a loop (which seems normal as there are additional handles for NULLs).Few days ago, a dev modified some string concatenation code (from
+ to CONCAT) because did not like the syntax and told me it became faster.In order to check the case, I have used the following code:
DECLARE @V1 NVARCHAR(MAX)
,@V2 NVARCHAR(MAX)
,@V3 NVARCHAR(MAX);
DECLARE @R NVARCHAR(MAX);
SELECT @V1 = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
,@V2 = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
,@V3 = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000);where this is variant one:
SELECT @R = CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}');
SELECT LEN(@R); -- 1200000017and this is variant two:
SELECT @R = CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'))
SELECT LEN(@R); -- 1200000017For smaller strings, there is no differences. At some point, the
CONCAT variant becomes faster:I wonder if someone can share any internals or explains the behavior as it seems that there might be a rule that it's better to concatenate large strings using
CONCAT.Version:
Microsoft SQL Server 2022 (RTM-CU8) (KB5029666) - 16.0.4075.1 (X64)
Aug 23 2023 14:04:50
Copyright (C) 2022 Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)
The exact script looks like the following:
```
DECLARE @V1 NVARCHAR(MAX)
,@V2 NVARCHAR(MAX)
,@V3 NVARCHAR(MAX);
DECLARE @R NVARCHAR(MAX);
SELECT @V1 = REPLICATE(CAST('V1' AS NVARCHAR(MAX)),
Solution
Is "+" slower than "CONCAT" for large strings?
Looks that way. From looking at the pages allocated and deallocated in
Whereas
Method
newly_allocated_pages
newly_deallocated_pages
new_in_use_pages
+
249280
249280
0
Concat
99696
99696
0
For my code below - which includes an extra
For
Expression
Datalength
pages_allocated
Datalength/pages_allocated
Diff
DataLength/Diff
CAST('' AS NVARCHAR(MAX)) + '{some small text}'
34
0
0
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}')
200000034
24920
8025.683547
24920
8025.683547
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}')
400000034
74768
5349.882757
49848
8024.39484
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}')
600000034
149552
4011.982682
74784
8023.107001
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}')
800000034
249280
3209.242755
99728
8021.81969
As David Browne - Microsoft says in the comments...
This makes sense as
as
more easily just build one new string
This distinction also shows up in truncation behaviour.
For the below in the first case the first two strings concatenated are non max data types and the result is truncated to
Script used for seeing page allocations
In the script below I don't assign to the
```
DECLARE @TestConcat BIT = 0 /Set to 0 for "+" and 1 for "Concat"/
,@internal_objects_alloc_page_count1 BIGINT
,@internal_objects_alloc_page_count2 BIGINT
,@internal_objects_dealloc_page_count1 BIGINT
,@internal_objects_dealloc_page_count2 BIGINT
,@V1 NVARCHAR(MAX) = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
,@V2 NVARCHAR(MAX) = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
,@V3 NVARCHAR(MAX) = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000)
,@V4 NVARCHAR(MAX) = REPLICATE(CAST('V4' AS NVARCHAR(MAX)), 50000000);
--Initial state after above variables assigned and before concatenation
SELECT @internal_objects_alloc_page_count1 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count1 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
IF @TestConcat = 1
SELECT DATALENGTH(CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'), ISNULL(@V4, '{}')));
ELSE
SELECT DATALENGTH(CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}'));
--State after concatenation completed
SELECT @internal_objects_alloc_page_count2 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count2 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
-- Diff between the two states
SELECT newly_allocated_pages = @internal_objects_alloc_page_count2-@internal_objects_alloc_page_count1 ,
newly_deallocated_pages = @internal_objects_dealloc_page_count2-@internal_objects_dealloc_page_count1 ,
new_in_use_pages = (@intern
Looks that way. From looking at the pages allocated and deallocated in
tempdb for storage of this LOB data it looks like each instance of the + ends up creating a new string and allocating and deallocating pages for it in tempdb (of which only the final one is needed).Whereas
CONCAT just uses the pages required for the size of the final result and does not degrade as more + are added by allocating pages for (and writing) ultimately unneeded intermediate strings.Method
newly_allocated_pages
newly_deallocated_pages
new_in_use_pages
+
249280
249280
0
Concat
99696
99696
0
For my code below - which includes an extra
@V4 concatenation vs the question - the data length of the final result is 800,000,034 bytes (762.94 MiB). The CONCAT method allocated 99,696 pages. An average of 8024.4 bytes per page. SQL Server uses 8KiB pages but some space is needed for page header (and other) overhead.For
+ it allocates 249,280 pages (~1.9 GiB) you can see how it builds up to this number by trying the intermediate expressions as below (each step uses a reasonable number of additional pages for the data length of its result but this is as well as the page usage of the preceding steps not instead of) so the ratio of "extra" pages allocated increases as more + are added.Expression
Datalength
pages_allocated
Datalength/pages_allocated
Diff
DataLength/Diff
CAST('' AS NVARCHAR(MAX)) + '{some small text}'
34
0
0
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}')
200000034
24920
8025.683547
24920
8025.683547
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}')
400000034
74768
5349.882757
49848
8024.39484
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}')
600000034
149552
4011.982682
74784
8023.107001
CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}')
800000034
249280
3209.242755
99728
8021.81969
As David Browne - Microsoft says in the comments...
This makes sense as
+ is a binary operator, and CONCAT is anN-ary function. So without special optimization a+b+c is evaluatedas
(a+b)+c, requiring two new strings. While CONCAT(A,B,C) canmore easily just build one new string
This distinction also shows up in truncation behaviour.
For the below in the first case the first two strings concatenated are non max data types and the result is truncated to
8000 bytes. Then it is concatenated to a max type and the resultant string is max and the final concatenation succeeds with no truncation.CONCAT can see that it has a MAX type passed in as a parameter and avoids this initial truncation.DECLARE @X VARCHAR(MAX) = ''
--14000
SELECT DATALENGTH(REPLICATE('A', 6000) + REPLICATE('B', 6000) + @X + REPLICATE('C', 6000))
--18000
SELECT DATALENGTH(CONCAT(REPLICATE('A', 6000), REPLICATE('A', 6000) , @X, REPLICATE('C', 6000)))Script used for seeing page allocations
In the script below I don't assign to the
@R variable as this itself requires pages to be allocated and I am just focussing on the pages used by the concatenation methods.```
DECLARE @TestConcat BIT = 0 /Set to 0 for "+" and 1 for "Concat"/
,@internal_objects_alloc_page_count1 BIGINT
,@internal_objects_alloc_page_count2 BIGINT
,@internal_objects_dealloc_page_count1 BIGINT
,@internal_objects_dealloc_page_count2 BIGINT
,@V1 NVARCHAR(MAX) = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
,@V2 NVARCHAR(MAX) = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
,@V3 NVARCHAR(MAX) = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000)
,@V4 NVARCHAR(MAX) = REPLICATE(CAST('V4' AS NVARCHAR(MAX)), 50000000);
--Initial state after above variables assigned and before concatenation
SELECT @internal_objects_alloc_page_count1 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count1 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
IF @TestConcat = 1
SELECT DATALENGTH(CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'), ISNULL(@V4, '{}')));
ELSE
SELECT DATALENGTH(CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}'));
--State after concatenation completed
SELECT @internal_objects_alloc_page_count2 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count2 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
-- Diff between the two states
SELECT newly_allocated_pages = @internal_objects_alloc_page_count2-@internal_objects_alloc_page_count1 ,
newly_deallocated_pages = @internal_objects_dealloc_page_count2-@internal_objects_dealloc_page_count1 ,
new_in_use_pages = (@intern
Code Snippets
DECLARE @X VARCHAR(MAX) = ''
--14000
SELECT DATALENGTH(REPLICATE('A', 6000) + REPLICATE('B', 6000) + @X + REPLICATE('C', 6000))
--18000
SELECT DATALENGTH(CONCAT(REPLICATE('A', 6000), REPLICATE('A', 6000) , @X, REPLICATE('C', 6000)))DECLARE @TestConcat BIT = 0 /*Set to 0 for "+" and 1 for "Concat"*/
,@internal_objects_alloc_page_count1 BIGINT
,@internal_objects_alloc_page_count2 BIGINT
,@internal_objects_dealloc_page_count1 BIGINT
,@internal_objects_dealloc_page_count2 BIGINT
,@V1 NVARCHAR(MAX) = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
,@V2 NVARCHAR(MAX) = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
,@V3 NVARCHAR(MAX) = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000)
,@V4 NVARCHAR(MAX) = REPLICATE(CAST('V4' AS NVARCHAR(MAX)), 50000000);
--Initial state after above variables assigned and before concatenation
SELECT @internal_objects_alloc_page_count1 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count1 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
IF @TestConcat = 1
SELECT DATALENGTH(CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'), ISNULL(@V4, '{}')));
ELSE
SELECT DATALENGTH(CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}'));
--State after concatenation completed
SELECT @internal_objects_alloc_page_count2 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count2 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;
-- Diff between the two states
SELECT newly_allocated_pages = @internal_objects_alloc_page_count2-@internal_objects_alloc_page_count1 ,
newly_deallocated_pages = @internal_objects_dealloc_page_count2-@internal_objects_dealloc_page_count1 ,
new_in_use_pages = (@internal_objects_alloc_page_count2 - @internal_objects_dealloc_page_count2) - (@internal_objects_alloc_page_count1 - @internal_objects_dealloc_page_count1);Context
StackExchange Database Administrators Q#334386, answer score: 28
Revisions (0)
No revisions yet.