HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Select a CSV string as multiple columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnscsvmultipleselectstring

Problem

I'm using SQL Server 2014 and I have a table with one column containing a CSV string:

110,200,310,130,null


The output from the table looks like this:

I want to select the second column as multiple columns, putting each item of the CSV string in a separate column, like this:

So I created a function for splitting a string:

create FUNCTION [dbo].[fn_splitstring]
(
    @List nvarchar(2000),
    @SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
    Id int identity(1,1),
    Value nvarchar(100)
) 
AS  
BEGIN 
    while (Charindex(@SplitOn,@List)>0)
    begin
        insert into @RtnValue (value)
        select 
            Value = ltrim(rtrim(Substring(@List,1,Charindex(@SplitOn,@List)-1)))

        set @List = Substring(@List,Charindex(@SplitOn,@List)+len(@SplitOn),len(@List))
    end
    insert Into @RtnValue (Value)
    select Value = ltrim(rtrim(@List))

    return
END


I would like to use it similar to this:

select Val , (select value from tvf_split_string(cchar1,','))  from table1


But the above code obviously won't work, because the function will return more than one row causing the subquery to return more than one value and breaking the code.

I can use something like:

select Val ,
(select value from tvf_split_string(cchar1,',') order by id offset 0 rows fetch next 1 rows only ) as col1,
(select value from tvf_split_string(cchar1,',') order by id offset 1 rows fetch next 1 rows only ) as col2,
................
 from table1


but I don't think it's a good approach.

What is the correct way to do it?

Solution

A Word on the Performance of Different Types of Functions

Generally speaking, Scalar Functions and mTVFs (Multi-Statement Table Valued Functions) are a bit of a built in performance problem. It's better, if you can, to use iTVFs (Inline Table Valued Functions) because their code is actually included in the execution plan (much like a VIEW but parameterized) instead of being executed separately. Rather than replicate an entire article here, please see the following for some proof even when an iTVF is used as an iSF (Inline Scalar Function). In many cases, Scalar functions can be 7 times slower than iTVFs even when the code is identical. Here's the link, which you can now get to without having to sign up to be a member. To summarize, if your function has the word "BEGIN" in it, it's NOT going to be a high performance iTVF.

How to Make Scalar UDFs Run Faster (SQL Spackle)

Test Data for SQL Server 2008 or Later:

Shifting gears to the original problem posted by @Reza and borrowing heavily from the code that @stefan provided to generate some sample rows of data, here's the test data we'll use. I do this in a Temp Table so that we can easily drop the test table without having to worry about accidently dropping a real table.

--===== If the test table exists, drop it to make reruns in SSMS eaiser
     IF OBJECT_ID('tempdb..#CSV','U') IS NOT NULL
   DROP TABLE #CSV
;
--===== Create the test table.
 CREATE TABLE #CSV --Using a Temp Table just for demo purposes
        (
         ID         INT IDENTITY(1,1) --or whatever your PK is
        ,AString    VARCHAR(8000)
        )
        ;
--===== Insert some test data
 INSERT INTO #CSV
        (AString)
 SELECT AString
   FROM (
        VALUES   ('123,456,88789,null,null')
                ,('123,456,99789,1234,null')
                ,('123,456,00789,1234,null')
                ,('1,2222,77789,null,null')
                ,('11,222,88789,null,')
                ,('111,22,99789,,')
                ,('1111,2,00789,oooo,null')
        ) v (AString)
;


Test Data for SQL Server 2005 or Later:

If you're still using SQL Server 2005, don't despair. All of the code will still work there. We just need to change the way we generate the test data because the VALUES Clause couldn't handle the way we generated the data above until 2008. Instead, you need to use a series of SELECT/UNION ALL statements. There's virtually no performance difference between the two if you need a larger volume of data and the SELECT/UNION ALL method still works just fine through 2016.

Here's the test table generation code modified to work for 2005.

--===== If the test table exists, drop it to make reruns in SSMS eaiser
     IF OBJECT_ID('tempdb..#CSV','U') IS NOT NULL
   DROP TABLE #CSV
;
--===== Create the test table.
 CREATE TABLE #CSV --Using a Temp Table just for demo purposes
        (
         ID         INT IDENTITY(1,1) --or whatever your PK is
        ,AString    VARCHAR(8000)
        )
        ;
--===== Insert some test data
 INSERT INTO #CSV
        (AString)
 SELECT AString
   FROM (
         SELECT '123,456,88789,null,null' UNION ALL
         SELECT '123,456,99789,1234,null' UNION ALL
         SELECT '123,456,00789,1234,null' UNION ALL
         SELECT '1,2222,77789,null,null'  UNION ALL
         SELECT '11,222,88789,null,'      UNION ALL
         SELECT '111,22,99789,,'          UNION ALL
         SELECT '1111,2,00789,oooo,null'
        ) v (AString)
;


Picking a Splitter

First of all, SQL Server isn't the greatest tool to use for string manipulations. It's been proven many times that a properly written SQLCLR will smoke just about any attempt at string splitting compared to pure T-SQL solutions. The problems associated with SQLCLR solutions are 1) finding one that is properly written (properly returns the expected result set) and 2) it's sometimes difficult to convince the resident DBAs to allow SQLCLR to be used. If you do find one that's properly written to return the expected results, it's definitely worth spending the time trying to convince the DBA because it will usually handle MAX data-types and cares little if you pass it VARCHAR or NVARCHAR data not to mention being faster than any T-SQL solution I've ever seen.

If you don't have to worry about using NVARCHAR and you don't have to worry about multi-character delimiters (or can change the delimiter "on the way in") and you don't have to worry about MAX datatypes (limited to 8K strings), then then DelimitedSplit8K function may be for you. You can find the article for it along with the proper "high cardinality" testing for it at the following URL.

Tally OH! An Improved SQL 8K “CSV Splitter” Function

I'll also tell you that the high cardinality test data generation code in that article broke when SQL Server 2012 came out because of changes to how XML can be used to concatenate. Wayne Sheffield did some separate testing to compare it to the new splitter function in 2016 and made a repair to the test

Code Snippets

--===== If the test table exists, drop it to make reruns in SSMS eaiser
     IF OBJECT_ID('tempdb..#CSV','U') IS NOT NULL
   DROP TABLE #CSV
;
--===== Create the test table.
 CREATE TABLE #CSV --Using a Temp Table just for demo purposes
        (
         ID         INT IDENTITY(1,1) --or whatever your PK is
        ,AString    VARCHAR(8000)
        )
        ;
--===== Insert some test data
 INSERT INTO #CSV
        (AString)
 SELECT AString
   FROM (
        VALUES   ('123,456,88789,null,null')
                ,('123,456,99789,1234,null')
                ,('123,456,00789,1234,null')
                ,('1,2222,77789,null,null')
                ,('11,222,88789,null,')
                ,('111,22,99789,,')
                ,('1111,2,00789,oooo,null')
        ) v (AString)
;
--===== If the test table exists, drop it to make reruns in SSMS eaiser
     IF OBJECT_ID('tempdb..#CSV','U') IS NOT NULL
   DROP TABLE #CSV
;
--===== Create the test table.
 CREATE TABLE #CSV --Using a Temp Table just for demo purposes
        (
         ID         INT IDENTITY(1,1) --or whatever your PK is
        ,AString    VARCHAR(8000)
        )
        ;
--===== Insert some test data
 INSERT INTO #CSV
        (AString)
 SELECT AString
   FROM (
         SELECT '123,456,88789,null,null' UNION ALL
         SELECT '123,456,99789,1234,null' UNION ALL
         SELECT '123,456,00789,1234,null' UNION ALL
         SELECT '1,2222,77789,null,null'  UNION ALL
         SELECT '11,222,88789,null,'      UNION ALL
         SELECT '111,22,99789,,'          UNION ALL
         SELECT '1111,2,00789,oooo,null'
        ) v (AString)
;
CREATE FUNCTION [dbo].[DelimitedSplit8K]
/**********************************************************************************************************************
 Purpose:
 Given a string containing multiple elements separated by a single character delimiter and that single character
 delimiter, this function will split the string and return a table of the single elements (Item) and the element
 position within the string (ItemNumber). 

 Notes:
 1. Performance of this function approaches that of a CLR.
 2. Note that this code implicitly converts NVARCHAR to VARCHAR and that conversion may NOT be faithful.

 Revision History:
 Note that this code is a modification of a well proven function created as a community effort and initially documented
 at the following URL (http://www.sqlservercentral.com/articles/Tally+Table/72993/). This code is still undergoing 
 tests. Although every care has certainly been taken to ensure its accuracy, you are reminded to do your own tests to 
 ensure that this function is suitable for whatever application you might use it for.
 --Jeff Moden, 01 Sep 2013 
**********************************************************************************************************************/
--===== Define I/O parameters
        (@pString VARCHAR(8000) , @pDelimiter CHAR(1)) --DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000).
  WITH E1(N) AS (--==== Itzik Ben-Gan style of a cCTE (Cascading CTE) and
                     -- should not be confused with a much slower rCTE (Recursive CTE).
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                                      --10E+1 or 10 rows
       E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d), --10E+4 or 10,000 rows max
 cteTally(N) AS ( --=== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS ( --=== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT CASE WHEN SUBSTRING(@pString,t.N,1) = @pDelimiter COLLATE Latin1_General_BIN THEN t.N+1 END --added short circuit for casting
                   FROM cteTally t 
                  WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter COLLATE Latin1_General_BIN
                ),
cteLen(N1,L1)AS ( --=== Return start position and length (for use in substring).
                     -- The ISNULL/NULLIF combo handles the length for the 
--===== Do the split for each row and repivot to columns using a 
     -- high performance CROSS TAB. It uses the function only once
     -- for each row, which is another advantage iTVFs have over
     -- Scalare Functions.
 SELECT  csv.ID
        ,Col1 = MAX(CASE WHEN ca.ItemNumber = 1 THEN Item ELSE '' END)
        ,Col2 = MAX(CASE WHEN ca.ItemNumber = 2 THEN Item ELSE '' END)
        ,Col3 = MAX(CASE WHEN ca.ItemNumber = 3 THEN Item ELSE '' END)
        ,Col4 = MAX(CASE WHEN ca.ItemNumber = 4 THEN Item ELSE '' END)
        ,Col5 = MAX(CASE WHEN ca.ItemNumber = 5 THEN Item ELSE '' END)
   FROM #CSV csv
  CROSS APPLY dbo.DelimitedSplit8K(csv.AString,',') ca
  GROUP BY csv.ID
;
GO

Context

StackExchange Database Administrators Q#172401, answer score: 7

Revisions (0)

No revisions yet.