patternsqlModerate
Trim whitespace (spaces, tabs, newlines)
Viewed 0 times
trimwhitespacenewlinesspacestabs
Problem
I'm on SQL Server 2014 and I need to clean whitespace from start and end of a column's content, where whitespace could be simple spaces, tabs or newlines (both
and so on.
I was able to only achieve the first case with
but for the other cases it doesn't work.
\n and \r\n); e.g.' this content ' should become 'this content'
' \r\n \t\t\t this \r\n content \t \r\n ' should become 'this \r\n content'and so on.
I was able to only achieve the first case with
UPDATE table t SET t.column = LTRIM(RTRIM(t.column))but for the other cases it doesn't work.
Solution
For anyone using SQL Server 2017 or newer
you can use the TRIM built-in function. For example:
Please note that the default behavior of
Also, I used
For anyone using SQL Server 2016 or older
You can create a function, either as a SQLCLR Scalar UDF or a T-SQL Inline TVF (iTVF). The T-SQL Inline TVF would be as follows:
Please note:
And use it as follows:
Returns:
And you can use that in an
As mentioned at the beginning, this is also really easy via SQLCLR since .NET includes a
And it returns the exact same string as shown above in the iTVF example output. But being a scalar UDF, you would use it as follows in an
Either one of the above should be efficient for using across millions of rows. Inline TVFs are optimizible unlike Multi-statement TVFs and T-SQL scalar UDFs. And, SQLCLR Scalar UDFs have the potential to be used in parallel plans, as long as they are marked as
you can use the TRIM built-in function. For example:
DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
PRINT N'~' + @Test + N'~';
PRINT N'------------------------';
PRINT N'~'
+ TRIM(NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A) FROM @Test)
+ N'~';
Please note that the default behavior of
TRIM is to remove only spaces, so in order to also remove the tabs and newlines (CR + LFs), you need to specify the characters FROM clause.Also, I used
NCHAR(0x09) for the tab characters in the @Test variable so that the example code can be copied-and-pasted and retain the correct characters. Otherwise, tabs get converted into spaces when this page is rendered.For anyone using SQL Server 2016 or older
You can create a function, either as a SQLCLR Scalar UDF or a T-SQL Inline TVF (iTVF). The T-SQL Inline TVF would be as follows:
CREATE
--ALTER
FUNCTION dbo.TrimChars(@OriginalString NVARCHAR(4000), @CharsToTrim NVARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
WITH frst AS
(
SELECT PATINDEX(N'%[^' + @CharsToTrim + N']%' COLLATE Latin1_General_BIN2, @OriginalString) AS [FirstChar],
DATALENGTH(@OriginalString) / 2 AS [ActualLength]
), lst AS
(
SELECT f.[FirstChar],
CASE f.[FirstChar]
WHEN 0 THEN (f.[ActualLength] + 2)
ELSE PATINDEX(N'%[^' + @CharsToTrim + N']%' COLLATE Latin1_General_BIN2, REVERSE(@OriginalString))
END AS [LastChar],
f.[ActualLength]
FROM frst f
)
SELECT l.[ActualLength],
l.[FirstChar],
CASE l.[FirstChar]
WHEN 0 THEN 0
ELSE ((l.[ActualLength] - l.[LastChar]) + 1)
END AS [LastChar],
((l.[ActualLength] - l.[LastChar]) - l.[FirstChar] + 2) AS [NonTrimmableLength],
SUBSTRING(@OriginalString COLLATE Latin1_General_BIN2, l.[FirstChar],
((l.[ActualLength] - l.[LastChar]) - l.[FirstChar] + 2)) AS [FixedString]
FROM lst l;
GO
Please note:
- Non-100-level collation used to ensure compatibility back to SQL Server 2005. Even if that's an infrequent requirement, there's no harm since for the purposes here, there's no difference between the 100 and non-100 level binary collations
- The
COLLATEclause is intentionally left off of theREVERSE()
And use it as follows:
DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
SELECT N'~' + tc.[FixedString] + N'~' AS [proof]
FROM dbo.TrimChars(@Test, NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) tc;
Returns:
proof
----
~this
content~
And you can use that in an
UPDATE using CROSS APPLY:UPDATE tbl
SET tbl.[Column] = itvf.[FixedString]
FROM SchemaName.TableName tbl
CROSS APPLY dbo.TrimChars(tbl.[Column],
NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) itvf
As mentioned at the beginning, this is also really easy via SQLCLR since .NET includes a
Trim() method that does exactly the operation you are wanting. You can either code your own to call SqlString.Value.Trim(), or you can just install the Free version of the SQL# library (which I created, but this function is in the Free version) and use either String_Trim (which does just white space) or String_TrimChars where you pass in the characters to trim from both sides (just like the iTVF shown above).DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
SELECT N'~' + SQL#.String_Trim(@Test) + N'~' AS [proof];
And it returns the exact same string as shown above in the iTVF example output. But being a scalar UDF, you would use it as follows in an
UPDATE:UPDATE tbl
SET tbl.[Column] = SQL#.String_Trim(itvf.[Column])
FROM SchemaName.TableName tbl
Either one of the above should be efficient for using across millions of rows. Inline TVFs are optimizible unlike Multi-statement TVFs and T-SQL scalar UDFs. And, SQLCLR Scalar UDFs have the potential to be used in parallel plans, as long as they are marked as
IsDeterministic=true and do not set either type of DataAccess to Read (the default for both User and System data access is None), and both of those conditions are true for both of the SQLCLR functions noted above.Context
StackExchange Database Administrators Q#132966, answer score: 13
Revisions (0)
No revisions yet.