HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Is "+" slower than "CONCAT" for large strings?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thanslowerlargeforconcatstrings

Problem

I have always thought that CONCAT function is actually a wrapper over the + (String Concatenation) with some additional checks in order to make our life more easier.

I have not found any internal details about how the functions are implemented. As to the performance, it seems that there is overhead for calling CONCAT when data is concatenating in a loop (which seems normal as there are additional handles for NULLs).

Few days ago, a dev modified some string concatenation code (from + to CONCAT) because did not like the syntax and told me it became faster.

In order to check the case, I have used the following code:

DECLARE @V1 NVARCHAR(MAX)
       ,@V2 NVARCHAR(MAX)
       ,@V3 NVARCHAR(MAX);

DECLARE @R NVARCHAR(MAX);

SELECT  @V1 = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
       ,@V2 = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
       ,@V3 = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000);


where this is variant one:

SELECT @R = CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}'); 
SELECT LEN(@R); -- 1200000017


and this is variant two:

SELECT @R = CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'))
SELECT LEN(@R); -- 1200000017


For smaller strings, there is no differences. At some point, the CONCAT variant becomes faster:

I wonder if someone can share any internals or explains the behavior as it seems that there might be a rule that it's better to concatenate large strings using CONCAT.

Version:

Microsoft SQL Server 2022 (RTM-CU8) (KB5029666) - 16.0.4075.1 (X64)
Aug 23 2023 14:04:50
Copyright (C) 2022 Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)

The exact script looks like the following:

```
DECLARE @V1 NVARCHAR(MAX)
,@V2 NVARCHAR(MAX)
,@V3 NVARCHAR(MAX);

DECLARE @R NVARCHAR(MAX);

SELECT @V1 = REPLICATE(CAST('V1' AS NVARCHAR(MAX)),

Solution

Is "+" slower than "CONCAT" for large strings?

Looks that way. From looking at the pages allocated and deallocated in tempdb for storage of this LOB data it looks like each instance of the + ends up creating a new string and allocating and deallocating pages for it in tempdb (of which only the final one is needed).

Whereas CONCAT just uses the pages required for the size of the final result and does not degrade as more + are added by allocating pages for (and writing) ultimately unneeded intermediate strings.

Method
newly_allocated_pages
newly_deallocated_pages
new_in_use_pages

+
249280
249280
0

Concat
99696
99696
0

For my code below - which includes an extra @V4 concatenation vs the question - the data length of the final result is 800,000,034 bytes (762.94 MiB). The CONCAT method allocated 99,696 pages. An average of 8024.4 bytes per page. SQL Server uses 8KiB pages but some space is needed for page header (and other) overhead.

For + it allocates 249,280 pages (~1.9 GiB) you can see how it builds up to this number by trying the intermediate expressions as below (each step uses a reasonable number of additional pages for the data length of its result but this is as well as the page usage of the preceding steps not instead of) so the ratio of "extra" pages allocated increases as more + are added.

Expression
Datalength
pages_allocated
Datalength/pages_allocated
Diff
DataLength/Diff

CAST('' AS NVARCHAR(MAX)) + '{some small text}'
34
0

0

CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}')
200000034
24920
8025.683547
24920
8025.683547

CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}')
400000034
74768
5349.882757
49848
8024.39484

CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}')
600000034
149552
4011.982682
74784
8023.107001

CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}')
800000034
249280
3209.242755
99728
8021.81969

As David Browne - Microsoft says in the comments...

This makes sense as + is a binary operator, and CONCAT is an
N-ary function. So without special optimization a+b+c is evaluated
as (a+b)+c, requiring two new strings. While CONCAT(A,B,C) can
more easily just build one new string

This distinction also shows up in truncation behaviour.

For the below in the first case the first two strings concatenated are non max data types and the result is truncated to 8000 bytes. Then it is concatenated to a max type and the resultant string is max and the final concatenation succeeds with no truncation.

CONCAT can see that it has a MAX type passed in as a parameter and avoids this initial truncation.

DECLARE @X VARCHAR(MAX) = ''

--14000
SELECT  DATALENGTH(REPLICATE('A', 6000) + REPLICATE('B', 6000) + @X + REPLICATE('C', 6000)) 

--18000
SELECT  DATALENGTH(CONCAT(REPLICATE('A', 6000), REPLICATE('A', 6000) , @X, REPLICATE('C', 6000)))


Script used for seeing page allocations

In the script below I don't assign to the @R variable as this itself requires pages to be allocated and I am just focussing on the pages used by the concatenation methods.

```
DECLARE @TestConcat BIT = 0 /Set to 0 for "+" and 1 for "Concat"/
,@internal_objects_alloc_page_count1 BIGINT
,@internal_objects_alloc_page_count2 BIGINT
,@internal_objects_dealloc_page_count1 BIGINT
,@internal_objects_dealloc_page_count2 BIGINT
,@V1 NVARCHAR(MAX) = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
,@V2 NVARCHAR(MAX) = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
,@V3 NVARCHAR(MAX) = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000)
,@V4 NVARCHAR(MAX) = REPLICATE(CAST('V4' AS NVARCHAR(MAX)), 50000000);

--Initial state after above variables assigned and before concatenation
SELECT @internal_objects_alloc_page_count1 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count1 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;

IF @TestConcat = 1
SELECT DATALENGTH(CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'), ISNULL(@V4, '{}')));
ELSE
SELECT DATALENGTH(CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}'));

--State after concatenation completed
SELECT @internal_objects_alloc_page_count2 = internal_objects_alloc_page_count,
@internal_objects_dealloc_page_count2 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;

-- Diff between the two states
SELECT newly_allocated_pages = @internal_objects_alloc_page_count2-@internal_objects_alloc_page_count1 ,
newly_deallocated_pages = @internal_objects_dealloc_page_count2-@internal_objects_dealloc_page_count1 ,
new_in_use_pages = (@intern

Code Snippets

DECLARE @X VARCHAR(MAX) = ''

--14000
SELECT  DATALENGTH(REPLICATE('A', 6000) + REPLICATE('B', 6000) + @X + REPLICATE('C', 6000)) 

--18000
SELECT  DATALENGTH(CONCAT(REPLICATE('A', 6000), REPLICATE('A', 6000) , @X, REPLICATE('C', 6000)))
DECLARE @TestConcat BIT = 0 /*Set to 0 for "+" and 1 for "Concat"*/
       ,@internal_objects_alloc_page_count1 BIGINT
       ,@internal_objects_alloc_page_count2 BIGINT
       ,@internal_objects_dealloc_page_count1 BIGINT
       ,@internal_objects_dealloc_page_count2 BIGINT
       ,@V1 NVARCHAR(MAX) = REPLICATE(CAST('V1' AS NVARCHAR(MAX)), 50000000)
       ,@V2 NVARCHAR(MAX) = REPLICATE(CAST('V2' AS NVARCHAR(MAX)), 50000000)
       ,@V3 NVARCHAR(MAX) = REPLICATE(CAST('V3' AS NVARCHAR(MAX)), 50000000)
       ,@V4 NVARCHAR(MAX) = REPLICATE(CAST('V4' AS NVARCHAR(MAX)), 50000000);

--Initial state after above variables assigned and before concatenation
SELECT @internal_objects_alloc_page_count1 = internal_objects_alloc_page_count, 
       @internal_objects_dealloc_page_count1 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;

IF @TestConcat = 1
    SELECT DATALENGTH(CONCAT('{some small text}',ISNULL(@V1, '{}'), ISNULL(@V2, '{}'), ISNULL(@V3, '{}'), ISNULL(@V4, '{}')));
ELSE
    SELECT DATALENGTH(CAST('' AS NVARCHAR(MAX)) + '{some small text}' + ISNULL(@V1, '{}') + ISNULL(@V2, '{}') + ISNULL(@V3, '{}') + ISNULL(@V4, '{}'));

--State after concatenation completed
SELECT @internal_objects_alloc_page_count2 = internal_objects_alloc_page_count, 
       @internal_objects_dealloc_page_count2 = internal_objects_dealloc_page_count
from sys.dm_db_task_space_usage
WHERE session_id = @@spid;

-- Diff between the two states 
SELECT newly_allocated_pages = @internal_objects_alloc_page_count2-@internal_objects_alloc_page_count1 ,
       newly_deallocated_pages = @internal_objects_dealloc_page_count2-@internal_objects_dealloc_page_count1 ,
       new_in_use_pages = (@internal_objects_alloc_page_count2 - @internal_objects_dealloc_page_count2) - (@internal_objects_alloc_page_count1 - @internal_objects_dealloc_page_count1);

Context

StackExchange Database Administrators Q#334386, answer score: 28

Revisions (0)

No revisions yet.