patternsqlMinor
Split string in chunks preferable at spaces
Viewed 0 times
chunkspreferablespacessplitstring
Problem
I'm working on a tool to import data from one database to another. One requirement is that I have to split a string from one source field into three (shorter) fields at the target. If possible the string should be split at a space character. If the string doesn't fit completely into the target fields, the rest can be omitted.
Usually I would solve this using an UDF but unfortunately neither UDF's nor Stored Procedures are allowed in my scenario.
My source database has the following table:
This table contains company names, for example:
During the import the records of this table should be inserted into a stating table in the target database. The staging table looks like this:
If I would simply use
But I don't want to split in the middle of a word so I would like to have the result like this:
`OrganisationID | Name1 | Name2
---------------+---------------------------------------------------+-------
Usually I would solve this using an UDF but unfortunately neither UDF's nor Stored Procedures are allowed in my scenario.
My source database has the following table:
CREATE TABLE dbo.Organisations
(
OrganisationID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
OrganisationName nvarchar(180) NOT NULL
/ More columns omitted for brevity /
)
This table contains company names, for example:
OrganisationID | OrganisationName
---------------+--------------------------------------------------------------------
1 | Microsoft Corporation
2 | S&T System Integration & Technology Distribution Aktiengesellschaft
During the import the records of this table should be inserted into a stating table in the target database. The staging table looks like this:
CREATE TABLE dbo.OrgStaging
(
OrganisationID int NOT NULL,
Name1 nvarchar(50) NOT NULL,
Name2 nvarchar(50) NOT NULL,
Name3 nvarchar(50) NOT NULL
/ More columns omitted for brevity /
)
If I would simply use
SUBSTRING to split the name I would end up in the staging table like this:OrganisationID | Name1 | Name2
---------------+---------------------------------------------------+-----------------
1 | Microsoft Corporation |
2 | S&T System Integration & Technology Distribution A|ktiengesellschaft
But I don't want to split in the middle of a word so I would like to have the result like this:
`OrganisationID | Name1 | Name2
---------------+---------------------------------------------------+-------
Solution
A nitpick
This first point will not really change the query, and it is probably
only added for testing purposes, but an
into a table with an
Us a more compact style
It looks like you either have a very strong policy on style and
layout, or used an auto-formatter, because in some places,
is written in two lines. That is good, because the query is at least
well formatted and readable. If you want it to be more compact,
though, you might want to skimp a bit on the newlines.
When formatting code more compactly, there is still "breathing space",
but the scroll-factor is tuned down a bit, so you have a bit higher
view of the code. I find that it can help.
I like to use a combination of indentation to recognize the query
parts (my indentation is by no means the default in SQL) and a thing I
call "one concept per line", where each line tells me something that
can stand on its own for logic.
Not indenting
the
```
DECLARE @MaxLen int = 50; -- Maximum length of a target column
WITH SpacePositions AS
( SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, 0) AS Position
FROM dbo.Organisations O
UNION ALL
SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, S.Position + 1) AS Position
FROM dbo.Organisations O
INNER JOIN SpacePositions S
ON CHARINDEX(' ', O.OrganisationName, S.Position + 1) > S.Position
AND S.OrganisationID = O.OrganisationID
)
, SplitPositions AS
( SELECT S.OrganisationID
, S.Position - 1 AS Position
FROM SpacePositions S
WHERE S.Position != 0
UNION
SELECT O.OrganisationID
, LEN(O.OrganisationName) AS Position
FROM dbo.Organisations O
)
, FirstChunk AS
( SELECT D.OrganisationID
, 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
WHERE Position BETWEEN 1 AND @MaxLen
UNION
SELECT S.OrganisationID
, @MaxLen
FROM SplitPositions S
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN 1 AND @MaxLen
AND SI.OrganisationID = S.OrganisationID
)
) D
GROUP BY D.OrganisationID
)
, SecondChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM FirstChunk C
INNER JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
UNION
SELECT S.OrganisationID
, C.ChunkEnd + @MaxLen AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
AND OrganisationID = C.OrganisationID
)
) D ON D.OrganisationID = C.OrganisationID
GROUP BY C.OrganisationID, C.ChunkEnd
)
, ThirdChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM SecondChunk C
INNER JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN SecondChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
UNION
SELECT S.OrganisationID
, C.ChunkEnd + @MaxLen AS Position
FROM SplitPositions S
INNER JOIN SecondChunk C ON C.OrganisationID = S.OrganisationID
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd +
This first point will not really change the query, and it is probably
only added for testing purposes, but an
ORDER BY with an INSERT INTO statement does not really do anything useful (unless you insertinto a table with an
IDENTITY column).Us a more compact style
It looks like you either have a very strong policy on style and
layout, or used an auto-formatter, because in some places,
SELECT *is written in two lines. That is good, because the query is at least
well formatted and readable. If you want it to be more compact,
though, you might want to skimp a bit on the newlines.
When formatting code more compactly, there is still "breathing space",
but the scroll-factor is tuned down a bit, so you have a bit higher
view of the code. I find that it can help.
I like to use a combination of indentation to recognize the query
parts (my indentation is by no means the default in SQL) and a thing I
call "one concept per line", where each line tells me something that
can stand on its own for logic.
Not indenting
UNION s helps to see the "equal level" of both sides ofthe
UNION, and prevents the lines from getting too long.```
DECLARE @MaxLen int = 50; -- Maximum length of a target column
WITH SpacePositions AS
( SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, 0) AS Position
FROM dbo.Organisations O
UNION ALL
SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, S.Position + 1) AS Position
FROM dbo.Organisations O
INNER JOIN SpacePositions S
ON CHARINDEX(' ', O.OrganisationName, S.Position + 1) > S.Position
AND S.OrganisationID = O.OrganisationID
)
, SplitPositions AS
( SELECT S.OrganisationID
, S.Position - 1 AS Position
FROM SpacePositions S
WHERE S.Position != 0
UNION
SELECT O.OrganisationID
, LEN(O.OrganisationName) AS Position
FROM dbo.Organisations O
)
, FirstChunk AS
( SELECT D.OrganisationID
, 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
WHERE Position BETWEEN 1 AND @MaxLen
UNION
SELECT S.OrganisationID
, @MaxLen
FROM SplitPositions S
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN 1 AND @MaxLen
AND SI.OrganisationID = S.OrganisationID
)
) D
GROUP BY D.OrganisationID
)
, SecondChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM FirstChunk C
INNER JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
UNION
SELECT S.OrganisationID
, C.ChunkEnd + @MaxLen AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
AND OrganisationID = C.OrganisationID
)
) D ON D.OrganisationID = C.OrganisationID
GROUP BY C.OrganisationID, C.ChunkEnd
)
, ThirdChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM SecondChunk C
INNER JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN SecondChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
UNION
SELECT S.OrganisationID
, C.ChunkEnd + @MaxLen AS Position
FROM SplitPositions S
INNER JOIN SecondChunk C ON C.OrganisationID = S.OrganisationID
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd +
Code Snippets
DECLARE @MaxLen int = 50; -- Maximum length of a target column
WITH SpacePositions AS
( SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, 0) AS Position
FROM dbo.Organisations O
UNION ALL
SELECT O.OrganisationID
, CHARINDEX(' ', O.OrganisationName, S.Position + 1) AS Position
FROM dbo.Organisations O
INNER JOIN SpacePositions S
ON CHARINDEX(' ', O.OrganisationName, S.Position + 1) > S.Position
AND S.OrganisationID = O.OrganisationID
)
, SplitPositions AS
( SELECT S.OrganisationID
, S.Position - 1 AS Position
FROM SpacePositions S
WHERE S.Position != 0
UNION
SELECT O.OrganisationID
, LEN(O.OrganisationName) AS Position
FROM dbo.Organisations O
)
, FirstChunk AS
( SELECT D.OrganisationID
, 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
WHERE Position BETWEEN 1 AND @MaxLen
UNION
SELECT S.OrganisationID
, @MaxLen
FROM SplitPositions S
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN 1 AND @MaxLen
AND SI.OrganisationID = S.OrganisationID
)
) D
GROUP BY D.OrganisationID
)
, SecondChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS ChunkEnd
FROM FirstChunk C
INNER JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
UNION
SELECT S.OrganisationID
, C.ChunkEnd + @MaxLen AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE NOT EXISTS ( SELECT *
FROM SplitPositions SI
WHERE SI.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
AND OrganisationID = C.OrganisationID
)
) D ON D.OrganisationID = C.OrganisationID
GROUP BY C.OrganisationID, C.ChunkEnd
)
, ThirdChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, MAX(D.Position) AS Chunk, FirstChunk AS
( SELECT O.OrganisationID
, 1 AS ChunkStart
, COALESCE(MAX(D.Position), @MaxLen) AS ChunkEnd
FROM dbo.Organisations O
LEFT JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
WHERE Position BETWEEN 1 AND @MaxLen
) D ON D.OrganisationID = O.OrganisationID
GROUP BY O.OrganisationID
), SecondChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, COALESCE(MAX(D.Position), C.ChunkEnd + @MaxLen) AS ChunkEnd
FROM FirstChunk C
LEFT JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN FirstChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
) D ON D.OrganisationID = C.OrganisationID
GROUP BY C.OrganisationID, C.ChunkEnd
)
, ThirdChunk AS
( SELECT C.OrganisationID
, C.ChunkEnd + 1 AS ChunkStart
, COALESCE(MAX(D.Position), C.ChunkEnd + @MaxLen) AS ChunkEnd
FROM SecondChunk C
LEFT JOIN ( SELECT S.OrganisationID
, S.Position + 1 AS Position
FROM SplitPositions S
INNER JOIN SecondChunk C ON C.OrganisationID = S.OrganisationID
WHERE S.Position BETWEEN C.ChunkEnd + 1 AND C.ChunkEnd + @MaxLen
) D ON D.OrganisationID = C.OrganisationID
GROUP BY C.OrganisationID, C.ChunkEnd
)INSERT INTO Organisations ([OrganisationName])
SELECT SUBSTRING(OrganisationName, 1, 180)
FROM ( SELECT 'Microsoft Corporation'
UNION ALL SELECT 'S&T System Integration & Technology Distribution Aktiengesellschaft'
UNION ALL SELECT 'VeryLongOrganisationNameThatWillHaveToBeSplitWithoutASpace Because It Really Is A Long Name, But In The Second Column We Can Split It'
UNION ALL SELECT 'Another VeryLongOrganisationNameThatWillHaveToBeSplitWithoutASpaceButOnlyInTheSecondColumn, Because It Really Is A Long Name'
UNION ALL SELECT 'AnotherVeryLongOrganisationNameThatWillHaveToBeSplitWithoutASpaceBecauseItReallyIsALongNameButNowItEvenExceedsTheLimitOfAllThreeColumnsWithAMaximumLenghtOf50Characters(WhichIsACombinedTotalOf150Characters)AndNowWeDon''tHaveAnythingToPutInTheLastBox'
UNION ALL SELECT 'OneWordOnly'
UNION ALL SELECT 'A' -- Single letter edge case
UNION ALL SELECT '' -- Empty string edge case
) Data(OrganisationName);SELECT *, LEN(Name1), LEN(Name2), LEN(Name3) FROM dbo.OrgStaging;Context
StackExchange Code Review Q#82468, answer score: 2
Revisions (0)
No revisions yet.