patternsqlMinor
Filtering characters to just return numeric characters in a string?
Viewed 0 times
numericjustreturncharactersstringfiltering
Problem
I want to filter an
I have some SQL to do this but it seems way more complicated than it needs to be. I am interested in finding out if anyone has a better way to filter out any non-numeric characters in a string?
Output
I have seen the Q & A T-SQL select query to remove non-numeric characters on Stack Overflow, but that answer has a similar solution to what I have found, using a CTE table and recursion. I am searching for something simpler. Hoping there is somet
nvarchar field to return just the numeric values.I have some SQL to do this but it seems way more complicated than it needs to be. I am interested in finding out if anyone has a better way to filter out any non-numeric characters in a string?
IF OBJECT_ID('tempdb..#MOB') IS NOT NULL
BEGIN
DROP Table #MOB
END
SELECT [mob]
INTO #MOB
FROM (
SELECT '(00) 1234 5678' AS [mob]
UNION
SELECT '1234 5678' AS [mob]
UNION
SELECT '+61 012 345 678' AS [mob]
) AS temp
;WITH [fill] ([Num], [Index], [MOBILEPHONE])
AS
(
SELECT
CASE
WHEN [MOBILEPHONE] IS NOT NULL
THEN SUBSTRING([MOBILEPHONE], 1, 1)
ELSE NULL
END AS [Num]
, 1 AS [INDEX], [MOBILEPHONE]
FROM (
SELECT DISTINCT [mob] AS [MOBILEPHONE]
FROM #MOB as t
) AS temp
UNION ALL
SELECT
SUBSTRING([F].[MOBILEPHONE], [F].[Index] + 1, 1) AS [Num]
,[F].[Index] + 1 AS [Index]
, [MOBILEPHONE]
FROM [fill] AS [F]
WHERE ([F].[Index] + 1) < LEN([F].[MOBILEPHONE]) + 1
)
SELECT [E].[MOBILEPHONE] AS [old_MOBILEPHONE],
STUFF((SELECT N'' + [F].[Num]
FROM [fill] AS [F]
WHERE (PATINDEX('%[^0-9]%', [F].[Num]) = 0 OR PATINDEX('%[^0-9]%', [F].[Num]) IS NULL) AND
([F].[MOBILEPHONE] = [E].[MOBILEPHONE])
ORDER BY [F].[MOBILEPHONE], [F].[Index]
FOR XML PATH('')), 1, 0, '')
AS [MOBILEPHONE]
FROM (
SELECT DISTINCT [t].[MOBILEPHONE]
FROM (SELECT [mob] AS [MOBILEPHONE] FROM #MOB) as t
) AS [E]Output
╔═════════════════╦═════════════╗
║ old_MOBILEPHONE ║ MOBILEPHONE ║
╠═════════════════╬═════════════╣
║ (00) 1234 5678 ║ 0012345678 ║
║ +61 012 345 678 ║ 61012345678 ║
║ 1234 5678 ║ 12345678 ║
╚═════════════════╩═════════════╝
I have seen the Q & A T-SQL select query to remove non-numeric characters on Stack Overflow, but that answer has a similar solution to what I have found, using a CTE table and recursion. I am searching for something simpler. Hoping there is somet
Solution
To use regular expressions, you would need to use a SQLCLR function. Solomon Rutzky has created a library of useful CLR functions called SQLsharp. The free edition includes several regular expression functions including
This produces the output shown below:
Regular expressions are a bit overkill for this simple requirement, so writing your own CLR implementation to just remove non-numerics would likely be even faster. Nevertheless, I have found the library function above to be as fast, if not faster, than the best T-SQL implementations (T-SQL string manipulation is rather slow).
For a T-SQL implementation, see Splitting Strings Based on Patterns by Dwain Camps.
RegEx_Replace4k as used below:SELECT
M.mob,
numeric_only =
SQL#.RegEx_Replace4k
(
M.mob, -- Source
N'\D', -- Regular expression
N'', -- Replace matches with empty string
-1, -- Unlimited replacements
1, -- Start at character position
NULL -- Options (see documentation)
)
FROM #MOB AS M;This produces the output shown below:
╔═════════════════╦══════════════╗
║ mob ║ numeric_only ║
╠═════════════════╬══════════════╣
║ (00) 1234 5678 ║ 0012345678 ║
║ +61 012 345 678 ║ 61012345678 ║
║ 1234 5678 ║ 12345678 ║
╚═════════════════╩══════════════╝
Regular expressions are a bit overkill for this simple requirement, so writing your own CLR implementation to just remove non-numerics would likely be even faster. Nevertheless, I have found the library function above to be as fast, if not faster, than the best T-SQL implementations (T-SQL string manipulation is rather slow).
For a T-SQL implementation, see Splitting Strings Based on Patterns by Dwain Camps.
Code Snippets
SELECT
M.mob,
numeric_only =
SQL#.RegEx_Replace4k
(
M.mob, -- Source
N'\D', -- Regular expression
N'', -- Replace matches with empty string
-1, -- Unlimited replacements
1, -- Start at character position
NULL -- Options (see documentation)
)
FROM #MOB AS M;Context
StackExchange Database Administrators Q#184456, answer score: 8
Revisions (0)
No revisions yet.