patternsqlModerate
Are the results of the STRING_SPLIT() function returned in a deterministic order?
Viewed 0 times
returnedtheorderarestring_splitfunctiondeterministicresults
Problem
I need to split a comma delimited string, manipulate it, and then concatenate it back into a single string retaining the original order of the data (if possible).
For example, take a column definition list of a
Without specifying an
Once I have the result set above, I want to apply some additional string manipulations to each row (such as append some constant text), and then concatenate every row back with a function like
Ultimately my question is: Are the results of the
As I type this out, I have a hunch the answer is no, the ordering is not deterministic therefor I'm not guaranteed the order of the results. Furthermore, I'm betting there's possible additional nondeterminism added for each function I run on top of the results, especially when I combine them back together with
For example, take a column definition list of a
CREATE TABLE statement (as a string) like so 'BrentOzarColumn INTEGER, PaulWhiteColumn DATETIME, ErikDarlingColumn VARCHAR(100)'. I'd like to split the comma delimited list to a result set, such as using SQL Server's built in function STRING_SPLIT() like so: SELECT TRIM([Value]) AS CoolDataPeople FROM STRING_SPLIT('BrentOzarColumn INTEGER, PaulWhiteColumn DATETIME, ErikDarlingColumn VARCHAR(100)', ',').Without specifying an
ORDER BY clause, this repeatedly yields (by coincedence?) the following results that appear to be ordered by their same ordering as they are in the string:Once I have the result set above, I want to apply some additional string manipulations to each row (such as append some constant text), and then concatenate every row back with a function like
STRING_AGG() (goodbye days of STUFF ... FOR XML PATH :) in the same order as the original string. So an example of my final result could be 'BrentOzarColumn INTEGER SQLROX, PaulWhiteColumn DATETIME SQLROX, ErikDarlingColumn VARCHAR(100) SQLROX'.Ultimately my question is: Are the results of the
STRING_SPLIT() function returned in a deterministic order? I know without an ORDER BY clause, ordering is not guaranteed when selecting from a dataset like a Table or View, but was wondering if there's a difference with functions?As I type this out, I have a hunch the answer is no, the ordering is not deterministic therefor I'm not guaranteed the order of the results. Furthermore, I'm betting there's possible additional nondeterminism added for each function I run on top of the results, especially when I combine them back together with
STRING_AGG(). (Regardless of the answer, I appreciate your help, and you're all cool data people. ;)Solution
No, they are not returned in a deterministic order.
While you are unlikely to see them returned in a different order, that doesn't make the current behavior deterministic or reliable. Tricks like applying
It's one of the features notably missing from
And so has Andy Mallon:
Vote and comment here:
(This item specifically asks for an additional column to be returned to indicate position within the original string but, due to backward compatibility issues, would likely need to be delivered via a new function, similar to how
As an aside, the documentation originally stated:
The sort order of the output rows matches the order of the substrings in the input string.
That was deliberately changed in this commit to remove any notion that the return order is promised. Now the documentation states:
The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string.
Why would they need to remove that promise?
I am not familiar with the internal implementation, but I believe the original version of the documentation was written by observers, not the function authors. That statement was likely written that way because that's what behavior they observed when they used the function. Many of us do the same thing when we tell people we don't need
My guess is they are covering their butts for future behavior. Think about all the changes they've been making to how functions work, and also ongoing changes to the optimizer.
Currently, with the simple case:
There isn't much opportunity or reason for that to be anything but sequential. But what about when you
What changes are they making?
The documentation was recently updated to show a new argument to the function,
enable_ordinal
An int or bit expression that serves as a flag to enable or disable the ordinal output column. A value of 1 enables the ordinal column. If enable_ordinal is omitted, NULL, or has a value of 0, the ordinal column is disabled.
Note
The enable_ordinal argument and ordinal output column are currently only supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only).
To be clear, adding this argument only adds an
While you are unlikely to see them returned in a different order, that doesn't make the current behavior deterministic or reliable. Tricks like applying
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) to the output in an intermediate place like a CTE are similarly not guaranteed to work.It's one of the features notably missing from
STRING_SPLIT(), which I've blogged about:- A way to improve STRING_SPLIT in SQL Server - and you can help
- Please help with STRING_SPLIT improvements
And so has Andy Mallon:
- When STRING_SPLIT() falls short
Vote and comment here:
- Add row position column to STRING_SPLIT
(This item specifically asks for an additional column to be returned to indicate position within the original string but, due to backward compatibility issues, would likely need to be delivered via a new function, similar to how
CONCAT_WS came to be.)As an aside, the documentation originally stated:
The sort order of the output rows matches the order of the substrings in the input string.
That was deliberately changed in this commit to remove any notion that the return order is promised. Now the documentation states:
The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string.
Why would they need to remove that promise?
I am not familiar with the internal implementation, but I believe the original version of the documentation was written by observers, not the function authors. That statement was likely written that way because that's what behavior they observed when they used the function. Many of us do the same thing when we tell people we don't need
ORDER BY when selecting rows from a clustered table: "they will always come out in this order." Pretty reliable, until the optimizer chooses a different index.My guess is they are covering their butts for future behavior. Think about all the changes they've been making to how functions work, and also ongoing changes to the optimizer.
Currently, with the simple case:
SELECT * FROM STRING_SPLIT('cow,dog,dinosaur','');There isn't much opportunity or reason for that to be anything but sequential. But what about when you
OUTER APPLY against an nvarchar(max) column between two huge, partitioned tables, and the data starts getting processed in batch mode and/or in parallel? Would they have to write additional logic in the code to ensure all that output came back in the right order? Would it be worth it? Would they have to revisit it for every future function processing / optimizer change to maintain that promise?What changes are they making?
The documentation was recently updated to show a new argument to the function,
enable_ordinal, available in Azure SQL Database and coming soon in SQL Server 2022:enable_ordinal
An int or bit expression that serves as a flag to enable or disable the ordinal output column. A value of 1 enables the ordinal column. If enable_ordinal is omitted, NULL, or has a value of 0, the ordinal column is disabled.
Note
The enable_ordinal argument and ordinal output column are currently only supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only).
To be clear, adding this argument only adds an
ordinal column to the output indicating the sequence of value within the original string; it still doesn't guarantee the output will be ordered by the ordinal column. For that, you still need to add ORDER BY to the output.Code Snippets
SELECT * FROM STRING_SPLIT('cow,dog,dinosaur','');Context
StackExchange Database Administrators Q#282790, answer score: 12
Revisions (0)
No revisions yet.