patternsqlMinor
Data extraction with multiple delimiters?
Viewed 0 times
withextractionmultipledatadelimiters
Problem
I have a legacy data source column that is delimitted by semicolons and commas. The first semicolon indicates the last name, the second indicates the first and middle name (or initials), and the last semicolon indicates the type of individual. The comma indicates that a new name has began. Here is a sample of this data.
```
+-------+---------------------------------------------------------------------------------------------------------------------+
| ID | SOURCE |
+-------+---------------------------------------------------------------------------------------------------------------------+
| 62963 | RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR |
| 62964 | HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62966 | KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED |
| 62967 | CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED |
| 62968 | LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED |
| 62969 | GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED |
| 62970 | STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED |
| 62971 | HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR |
+-------+
```
+-------+---------------------------------------------------------------------------------------------------------------------+
| ID | SOURCE |
+-------+---------------------------------------------------------------------------------------------------------------------+
| 62963 | RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR |
| 62964 | HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62966 | KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED |
| 62967 | CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED |
| 62968 | LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED |
| 62969 | GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED |
| 62970 | STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED |
| 62971 | HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR |
+-------+
Solution
As long as the legacy data is as described and does not have any odd variations, the following should work as it produces the desired output.
Notes:
Test setup:
Main query:
Notes:
- The following two code sections should be run together in SSMS as they are using a table variable and hence need to be in the same query batch. I just separated it into two pieces to make it easier to focus on just the main query.
- The splitter I used, String_Split, is SQLCLR-based and is available in the SQL# library (which I am the author of, but String_Split, and others, are available in the Free version). However, any splitter that returns the row/item # should work just the same (just don't use one that is based on a
WHILEloop).
Test setup:
DECLARE @SampleData TABLE (ID INT NOT NULL, SourceStuff VARCHAR(MAX) NOT NULL);
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62963, 'RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62964, 'HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62965, 'ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62966, 'KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62967, 'CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62968, 'LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62969, 'GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62970, 'STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED');
INSERT INTO @SampleData (ID, SourceStuff) VALUES (62971, 'HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR');
Main query:
;WITH cte AS
(
SELECT sd.ID,
split.SplitNum,
split.SplitVal,
CHARINDEX(N';', split.SplitVal) AS [FirstDelimeter],
CHARINDEX(N';', split.SplitVal,
(CHARINDEX(N';', split.SplitVal) + 1)) AS [SecondDelimeter]
FROM #SampleData sd
CROSS APPLY SQL#.String_Split(sd.SourceStuff, N',', 1) split
)
SELECT tmp.ID,
tmp.SplitNum AS [Sequence],
SUBSTRING(tmp.SplitVal,
1,
(tmp.FirstDelimeter - 1)) AS [Last],
SUBSTRING(tmp.SplitVal,
(tmp.FirstDelimeter + 1),
(tmp.SecondDelimeter - (tmp.FirstDelimeter + 1))) AS [FirstMiddle],
SUBSTRING(tmp.SplitVal,
(tmp.SecondDelimeter + 1),
LEN(tmp.SplitVal)) AS [Type]
FROM cte tmp;
Context
StackExchange Database Administrators Q#121915, answer score: 3
Revisions (0)
No revisions yet.