HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Trying to check if a string contains a number exclusively

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
exclusivelynumbertryingcontainscheckstring

Problem

I have been trying to write a function to check if a string contains a number without that number being part of a bigger number (in other words if the number being searched for is '6' and the string is '7+16+2' it should return false, because the '6' in this string is part of the number '16')

I wrote the function below (It is lengthy but I was going to test it first before refactoring it)

Upon testing I found a bug whereby it only runs the first instance of a found number through the logic. So running this function with '6' against '16+7+9+6' would return false because it determines that the firrst '6' is part of a bigger number and stops processing.

I figured that to get around this I have to implement a loop to shorten the 'haystack' string (so that, using the example '16+7+9+6' the function continues checking '+7+9+6' after eliminating the first '6') but before spending the time to make an already convoluted function even more convoluted I wanted to check if there is a simpler way to acheive the same goal?

```
drop function dbo.runners_contain_runner
go
create function dbo.runners_contain_runner(@runner varchar(max), @runners varchar(max))
returns int
as
begin

/*
eliminate the plus sign from @runners so that the
'isnumeric' function doesn't return false positives (it returns 1 for '+')
*/
set @runners = replace(@runners,'+','_' )

declare @ret int;
set @ret = 0;

-- if the runner is the only runner return 1
if @runners = @runner
set @ret = 1
else
begin
declare @charindex int;
set @charindex = charindex(@runner,@runners)
if @charindex > 0
begin

-- if it is at the beginning then check the char after it
if @charindex = 1
begin
if isnumeric(substring(@runners,@charindex + len(@runner),1)) = 0
set @ret = @charindex
end

-- if it is at the end then check the char before it

Solution

Perhaps you are over-complicating this by focusing too much on wanting a number. Take a step back for a moment. What you actually want is a substring without any digits on either side of it. The only way a number could be part of a larger number is to have at least 1 digit on either side of it, right? So as long as you only pass in numbers, then this definition should still produce numbers that do not have any digits on either side.

With that in mind, we just need 3 PATINDEX predicates to cover the passed-in value being on the far left, on the far right, or in the middle. Try the following as it seems to work:

GO
CREATE PROCEDURE #TestFindRunner
(
  @Runner VARCHAR(10)
)
AS
SET NOCOUNT ON;

DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.*
FROM   @Data tmp
WHERE  @Runner COLLATE Latin1_General_100_BIN2 = tmp.[Runners]
OR     PATINDEX('%[^0123456789]' + @Runner COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX(@Runner + '[^0123456789]%' COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                COLLATE Latin1_General_100_BIN2, tmp.[Runners]) > 0
GO


And then test with:

EXEC #TestFindRunner 0;
EXEC #TestFindRunner 2;
EXEC #TestFindRunner 4;
EXEC #TestFindRunner 8;
EXEC #TestFindRunner 11;
-- 0 rows

EXEC #TestFindRunner 3;   -- 4
EXEC #TestFindRunner 77;  -- 3
EXEC #TestFindRunner 111; -- 5
-- 1 row

EXEC #TestFindRunner 5; -- 2 and 3
-- 2 rows

EXEC #TestFindRunner 1; -- 6, 7, 8, and 9
-- 4 rows

EXEC #TestFindRunner 6; -- 1, 4, 6, 7, and 11
-- 5 rows

EXEC #TestFindRunner 7; -- 1, 2, 6, 7, 8, and 9
-- 6 rows

EXEC #TestFindRunner 9; -- 1, 2, 3, 6, 7, 8, and 9
-- 7 rows


The reason for having 3 variations of the PATINDEX is that PATINDEX search patterns are not Regular Expressions (RegeEx), contrary to what many people say / think (same with LIKE patterns). PATINDEX and LIKE patterns do not have quantifiers, so it is not possible to specify that the [^0123456789] single character replacement should be "0 or more"; it is "one and only one; no more, no less".

Forcing the binary collation (i.e. the COLLATE Latin1_General_100_BIN2 after each @Runner reference) ensures that we are only dealing with these 10 decimal digits and not any other character that might be considered equivalent

To put the above logic into an Inline Table-Valued Function (TVF) so that it is easier to use (and more efficient than a similarly easy-to-use scalar UDF), try the following:

USE [tempdb];
GO
CREATE FUNCTION dbo.IsRunnerPresent
(
  @Runner VARCHAR(10),
  @Runners VARCHAR(8000)
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN

  SELECT CONVERT(BIT,
    CASE WHEN @Runner COLLATE Latin1_General_100_BIN2 = @Runners
           OR PATINDEX('%[^0123456789]' + @Runner
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX(@Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           THEN 1
         ELSE 0
      END) AS [RunnerFound];
GO


And then test with:

DECLARE @Runner VARCHAR(10);
SET @Runner = '6';

DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.[ID],
       tmp.[Runners],
       fnd.[RunnerFound]
FROM   @Data tmp
CROSS APPLY dbo.IsRunnerPresentTVF(@Runner, tmp.[Runners]) fnd;


Which returns:

`ID Runners RunnerFound
1 16+7+9+6 1
2 16+7

Code Snippets

GO
CREATE PROCEDURE #TestFindRunner
(
  @Runner VARCHAR(10)
)
AS
SET NOCOUNT ON;

DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.*
FROM   @Data tmp
WHERE  @Runner COLLATE Latin1_General_100_BIN2 = tmp.[Runners]
OR     PATINDEX('%[^0123456789]' + @Runner COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX(@Runner + '[^0123456789]%' COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                COLLATE Latin1_General_100_BIN2, tmp.[Runners]) > 0
GO
EXEC #TestFindRunner 0;
EXEC #TestFindRunner 2;
EXEC #TestFindRunner 4;
EXEC #TestFindRunner 8;
EXEC #TestFindRunner 11;
-- 0 rows

EXEC #TestFindRunner 3;   -- 4
EXEC #TestFindRunner 77;  -- 3
EXEC #TestFindRunner 111; -- 5
-- 1 row

EXEC #TestFindRunner 5; -- 2 and 3
-- 2 rows

EXEC #TestFindRunner 1; -- 6, 7, 8, and 9
-- 4 rows

EXEC #TestFindRunner 6; -- 1, 4, 6, 7, and 11
-- 5 rows

EXEC #TestFindRunner 7; -- 1, 2, 6, 7, 8, and 9
-- 6 rows

EXEC #TestFindRunner 9; -- 1, 2, 3, 6, 7, 8, and 9
-- 7 rows
USE [tempdb];
GO
CREATE FUNCTION dbo.IsRunnerPresent
(
  @Runner VARCHAR(10),
  @Runners VARCHAR(8000)
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN

  SELECT CONVERT(BIT,
    CASE WHEN @Runner COLLATE Latin1_General_100_BIN2 = @Runners
           OR PATINDEX('%[^0123456789]' + @Runner
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX(@Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           THEN 1
         ELSE 0
      END) AS [RunnerFound];
GO
DECLARE @Runner VARCHAR(10);
SET @Runner = '6';


DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.[ID],
       tmp.[Runners],
       fnd.[RunnerFound]
FROM   @Data tmp
CROSS APPLY dbo.IsRunnerPresentTVF(@Runner, tmp.[Runners]) fnd;

Context

StackExchange Database Administrators Q#221642, answer score: 5

Revisions (0)

No revisions yet.