HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

TSQL Query to match different lengths of strings with each other

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tsqleachwithquerymatchdifferentlengthsotherstrings

Problem

I'm writing a TVF to query a large table (tens of millions of rows) where a string (Postcode) in that table matches a string (a partial postcode (outcode/sector portion)) from another table.

I'm hitting an edge case that I can't resolve.

For those unfamiliar with UK Postcodes

Here's some sample data.

DECLARE @tab1 TABLE (Sk INT, Postcode VARCHAR(8))
DECLARE @tab2 TABLE (Sk INT, Coverage VARCHAR(8)) 

INSERT INTO @tab1 (Sk, Postcode) VALUES (1, 'E12 5HH'), (6, 'SW1X 6AA')
INSERT INTO @tab2 (Sk, Coverage) VALUES (1, 'E12'), (1, 'E12 5'), 
                                        (2, 'E1'), (2, 'E11'), (2, 'E13'), 
                                        (3, 'E12 6'),
                                        (4, 'E12 5') ,
                                        (5, 'E12') ,
                                        (7, 'SW1') ,
                                        (8, 'SW1X')


And my current query

SELECT  S.Sk, 
        S.Postcode, 
        CoverageSk = X.Sk, 
        X.Coverage 
FROM        @tab1   S 
OUTER APPLY (
    SELECT  Sk , 
            Coverage , 
            [Length] = LEN(Coverage) 
    FROM    @tab2
)   X 
WHERE   S.Sk <> X.Sk 
AND     LEFT(S.Postcode,X.[Length] ) = X.Coverage


These are my results.

The data on rows 1 and 4 shouldn't be in the results.
Row 1, The Coverage or outcode "E1" isn't the same as the outcode portion ("E12") of the Postcode "E12 5HH".

The same for Row 4, The Coverage or outcode "SW1" isn't the same as the outcode ("SW1X") portion of the Postcode "SW1X 6AA".

Solution

Ideally, the source tables would have the component parts broken out so you could match them directly.

If that's not possible, this works for the sample data:

SELECT 
    T1.Sk, 
    T1.Postcode, 
    CoverageSk = T2.Sk, 
    T2.Coverage 
FROM @tab1 AS T1
JOIN @tab2 AS T2
    ON T2.Sk <> T1.Sk
    AND T1.Postcode LIKE 
            CASE
                WHEN CHARINDEX(SPACE(1), T2.Coverage) > 0 
                THEN T2.Coverage
                ELSE T2.Coverage + SPACE(1)
            END + '%';


Sk
Postcode
CoverageSk
Coverage

1
E12 5HH
4
E12 5

1
E12 5HH
5
E12

6
SW1X 6AA
8
SW1X

SQL Server can use an index on Postcode via a dynamic seek:

Code Snippets

SELECT 
    T1.Sk, 
    T1.Postcode, 
    CoverageSk = T2.Sk, 
    T2.Coverage 
FROM @tab1 AS T1
JOIN @tab2 AS T2
    ON T2.Sk <> T1.Sk
    AND T1.Postcode LIKE 
            CASE
                WHEN CHARINDEX(SPACE(1), T2.Coverage) > 0 
                THEN T2.Coverage
                ELSE T2.Coverage + SPACE(1)
            END + '%';

Context

StackExchange Database Administrators Q#306934, answer score: 5

Revisions (0)

No revisions yet.