HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Filtering characters to just return numeric characters in a string?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
numericjustreturncharactersstringfiltering

Problem

I want to filter an nvarchar field to return just the numeric values.

I have some SQL to do this but it seems way more complicated than it needs to be. I am interested in finding out if anyone has a better way to filter out any non-numeric characters in a string?

IF OBJECT_ID('tempdb..#MOB') IS NOT NULL
BEGIN
    DROP Table #MOB
END

SELECT [mob]
INTO #MOB
FROM (
SELECT '(00) 1234 5678' AS [mob]
UNION
SELECT '1234 5678' AS [mob]
UNION
SELECT '+61 012 345 678' AS [mob]
) AS temp

;WITH [fill] ([Num], [Index], [MOBILEPHONE])
AS
(
    SELECT 
    CASE 
        WHEN [MOBILEPHONE] IS NOT NULL
        THEN SUBSTRING([MOBILEPHONE], 1, 1) 
        ELSE NULL 
    END AS [Num]
    , 1 AS [INDEX], [MOBILEPHONE]
    FROM (
        SELECT DISTINCT [mob] AS [MOBILEPHONE]
        FROM #MOB as t
    ) AS temp
    UNION ALL
    SELECT 
    SUBSTRING([F].[MOBILEPHONE], [F].[Index] + 1, 1) AS [Num]
    ,[F].[Index] + 1 AS [Index]
    , [MOBILEPHONE]
    FROM [fill] AS [F]
    WHERE ([F].[Index] + 1) < LEN([F].[MOBILEPHONE]) + 1
)

SELECT [E].[MOBILEPHONE] AS [old_MOBILEPHONE],
    STUFF((SELECT N'' + [F].[Num]
    FROM [fill] AS [F]
    WHERE (PATINDEX('%[^0-9]%', [F].[Num]) = 0 OR PATINDEX('%[^0-9]%', [F].[Num]) IS NULL) AND
    ([F].[MOBILEPHONE] = [E].[MOBILEPHONE])
    ORDER BY [F].[MOBILEPHONE], [F].[Index]
    FOR XML PATH('')), 1, 0, '')
    AS [MOBILEPHONE]
FROM (
        SELECT DISTINCT [t].[MOBILEPHONE]
        FROM (SELECT [mob] AS [MOBILEPHONE] FROM #MOB) as t
    ) AS [E]


Output

╔═════════════════╦═════════════╗
║ old_MOBILEPHONE ║ MOBILEPHONE ║
╠═════════════════╬═════════════╣
║ (00) 1234 5678 ║ 0012345678 ║
║ +61 012 345 678 ║ 61012345678 ║
║ 1234 5678 ║ 12345678 ║
╚═════════════════╩═════════════╝


I have seen the Q & A T-SQL select query to remove non-numeric characters on Stack Overflow, but that answer has a similar solution to what I have found, using a CTE table and recursion. I am searching for something simpler. Hoping there is somet

Solution

To use regular expressions, you would need to use a SQLCLR function. Solomon Rutzky has created a library of useful CLR functions called SQLsharp. The free edition includes several regular expression functions including RegEx_Replace4k as used below:

SELECT 
    M.mob,
    numeric_only = 
        SQL#.RegEx_Replace4k
        (
            M.mob,  -- Source
            N'\D',  -- Regular expression
            N'',     -- Replace matches with empty string
            -1,     -- Unlimited replacements
            1,      -- Start at character position
            NULL    -- Options (see documentation)
        )
FROM #MOB AS M;


This produces the output shown below:

╔═════════════════╦══════════════╗
║ mob ║ numeric_only ║
╠═════════════════╬══════════════╣
║ (00) 1234 5678 ║ 0012345678 ║
║ +61 012 345 678 ║ 61012345678 ║
║ 1234 5678 ║ 12345678 ║
╚═════════════════╩══════════════╝


Regular expressions are a bit overkill for this simple requirement, so writing your own CLR implementation to just remove non-numerics would likely be even faster. Nevertheless, I have found the library function above to be as fast, if not faster, than the best T-SQL implementations (T-SQL string manipulation is rather slow).

For a T-SQL implementation, see Splitting Strings Based on Patterns by Dwain Camps.

Code Snippets

SELECT 
    M.mob,
    numeric_only = 
        SQL#.RegEx_Replace4k
        (
            M.mob,  -- Source
            N'\D',  -- Regular expression
            N'',     -- Replace matches with empty string
            -1,     -- Unlimited replacements
            1,      -- Start at character position
            NULL    -- Options (see documentation)
        )
FROM #MOB AS M;

Context

StackExchange Database Administrators Q#184456, answer score: 8

Revisions (0)

No revisions yet.