patternsqlMinor
Splitting an address into fields in SQL
Viewed 0 times
addresssqlintofieldssplitting
Problem
I have the following code to split an address string in T-SQL. Excepting the unit number, which is already in its own field, it would affect too much of the application to split the address into different database fields. It's a bit of a long story, but it's also not possible to make it a parameterized query, or to write the equivalent application code in C#. Still, I'm curious if anyone can think of any edge case that would put things into the wrong fields.
The only case I've come across is "East End Ave", but even the US Postal Service suggests incorrectly parsing 'East' as a directional (for consistency) (reference: very long pdf, search for 'directional', or go to page 233).
I'm only worried about Canadian and American addresses, but in Canada, French-style street types before the street name are fair game (e.g. 'Rue Montreal'). For a tie-breaker, I default to the English style (e.g. 'Rue Street' is parsed with the name 'Rue' and type 'Street').
```
DECLARE @CivicNum AS NVARCHAR(10) = ''
DECLARE @Predir AS NVARCHAR(10) = ''
DECLARE @Name AS NVARCHAR(100) = ''
DECLARE @Type AS NVARCHAR(15) = ''
DECLARE @FrenchType AS NVARCHAR(15) = ''
DECLARE @Postdir AS NVARCHAR(10) = ''
SET @Name = '123 E Fake St. S'
SET @CivicNum = SUBSTRING(@Name, 1, 1)
IF ISNUMERIC(@CivicNum) = 1
BEGIN
-- will assume an initial integer is a civic number
DECLARE @LastSpace AS INTEGER
DECLARE @NextSpace AS INTEGER
DECLARE @Temp as VARCHAR(15)
-- get the first token, and push it into @CivicNum
SET @LastSpace = 1
SET @NextSpace = CHAR
The only case I've come across is "East End Ave", but even the US Postal Service suggests incorrectly parsing 'East' as a directional (for consistency) (reference: very long pdf, search for 'directional', or go to page 233).
I'm only worried about Canadian and American addresses, but in Canada, French-style street types before the street name are fair game (e.g. 'Rue Montreal'). For a tie-breaker, I default to the English style (e.g. 'Rue Street' is parsed with the name 'Rue' and type 'Street').
StreetType and StreetDirection are tables which contain street types and street directions, as TypeName/TypeAbbr and DirectionName/DirectionType pairs respectively - I can post the values I've been using there if anyone's curious, but the StreetType is quite long. StreetDirection also includes French directions ('ouest', 'nord', 'sud', 'est', etc.), as well as their equivalents of 'northeast', 'northwest', etc.```
DECLARE @CivicNum AS NVARCHAR(10) = ''
DECLARE @Predir AS NVARCHAR(10) = ''
DECLARE @Name AS NVARCHAR(100) = ''
DECLARE @Type AS NVARCHAR(15) = ''
DECLARE @FrenchType AS NVARCHAR(15) = ''
DECLARE @Postdir AS NVARCHAR(10) = ''
SET @Name = '123 E Fake St. S'
SET @CivicNum = SUBSTRING(@Name, 1, 1)
IF ISNUMERIC(@CivicNum) = 1
BEGIN
-- will assume an initial integer is a civic number
DECLARE @LastSpace AS INTEGER
DECLARE @NextSpace AS INTEGER
DECLARE @Temp as VARCHAR(15)
-- get the first token, and push it into @CivicNum
SET @LastSpace = 1
SET @NextSpace = CHAR
Solution
Reviewing this code is essentially meaningless without a ream of test data to compare against.... and it would be really nice to have a debugger!
The REVERSE operations make keeping track of the data state quite complicated.
In general though, the code is formatted well, though there are a couple of issues:
-
Your first if block
-
you have a number of
-
The code is just too much for this area. You need to break it down some more, or find a better algorithm. Tracking the string manipulations is a nightmare, and, frankly, I gave up... which means the code is not well documented, or simply too cumbersome.
My recommendations to you (in a priority of sorts):
Conclusion:
I figure your code is close to right, It may not be worth breaking it. Adding a CDL would be a pain to do, but the code for the parse would be much easier to manage.
My money is on the system-change to store the parsed fields.... ;-)
The REVERSE operations make keeping track of the data state quite complicated.
In general though, the code is formatted well, though there are a couple of issues:
-
Your first if block
IF ISNUMERIC(@CivicNum) = 1 has a BEGIN, but I can't see the closing END, which I would have expected to find before the indent-matching ELSE in the second-to-last line. I would assume this is a typo?-
you have a number of
DECLARE instances that are issued inside various other blocks. It is traditional to move all your declare blocks to be the initial part of your t-sql source... I am not certain whether this code-style is relevant though, but DECLARE @LastSpace AS INTEGER and others should be declared at the top.-
The code is just too much for this area. You need to break it down some more, or find a better algorithm. Tracking the string manipulations is a nightmare, and, frankly, I gave up... which means the code is not well documented, or simply too cumbersome.
My recommendations to you (in a priority of sorts):
- replace the 1-column system with a pre-parsed data system
- create a separate table with the data columns split, and have a join that your can access to see the fields.
- change the user requirmenets to not need this breakdown.
- Create a CDL User-Defined-Function (C# or something) that does the parse using a more friendly format, perhaps returning the data in tab-separated values.... and then splitting them in the calling SQL.
- go with what you have, it basically works.
Conclusion:
I figure your code is close to right, It may not be worth breaking it. Adding a CDL would be a pain to do, but the code for the parse would be much easier to manage.
My money is on the system-change to store the parsed fields.... ;-)
Context
StackExchange Code Review Q#15012, answer score: 5
Revisions (0)
No revisions yet.