HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Put it in a bucket

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
putbucketstackoverflow

Problem

You're running a jeans company. Your system is collecting detailed orders' data, but it's a bit archaic and it's storing the number of units ordered per size in a delimited string with 20 "fields", each taking up 4 characters. You want to normalize this data to make it easier to eventually analyze orders at size level.

Given this string:

declare @input varchar(80);
set @input = '                       1   2   2   4       4       2   2   1                    ';


I want to extract the values, and store them in a way that enables me to correlate a field's index with the corresponding size in the size scale, which might look like this:

-- '                      28  30  32  34  35  36  37  38  40  42  44  46  48        '


Here's the t-sql I came up with:

create function dbo.GetSizeBreakdown(@input varchar(80))
returns @result table (SizeIndex int, Units int)
as
begin

    -- declare @result table (SizeIndex int, Units int);
    -- declare @input varchar(80);
    -- set @input = '                       1   2   2   4       4       2   2   1                    ';

    declare @fieldCount int = 20;
    declare @fieldLength int = 4;

    declare @index int = 0;
    declare @fieldValue varchar(4);

    while(@index < (@fieldCount))
    begin

        set @fieldValue = substring(@input, @index * @fieldLength + 1, @fieldLength)
        if (isnumeric(@fieldValue) != 0)
        begin
            insert into @result
            select @index + 1, cast(ltrim(@fieldValue) as int);
        end;

        set @index = @index + 1;
    end;

    -- select * from @result;
    return;
end;


select * from dbo.GetSizeBreakdown('                       1   2   2   4       4       2   2   1                    ');


Returns this:

SizeIndex    Units
 1|  7            1
 2|  8            2
 3|  9            2
 4|  10           4
 5|  12           4
 6|  14           2
 7|  15           2
 8|  16           1


Which is perfect because from there I can put e

Solution

The Good:

The code itself is pretty clean. You used clear and concise names and some of the best indentation I've ever seen in SQL. I would prefer that keywords like substring and select be ALLCAPS to differentiate them from fields and variables, but you were consistent.
The Bad:

  • @fieldvalue is declared as varchar(4). You know exactly how long it is, so char(4) is a better choice of datatype. Even better yet would be to declare it as an int and let the RDBMS implicitly cast it so you don't have to later.



  • This is a function that returns multiple fields for multiple records. Functions should only return a single field and record. This is better designed as a stored procedure. Scratch that. You used a table valued function.



  • "dbo" stands for "Database Owner". The dbo schema should be reserved for database maintenance tasks and data. Business data and logic should be kept separate in its own schema(s).



The Ugly:

You're using Structured Query Language to do procedural programming. t-sql just isn't designed to do that efficiently. The code has to loop twenty times for each "record" you pass into it. I'm sure this works great for 1 row and is probably still good for 20,000 rows, but what about 200,000? 2,000,000? This just won't scale well. A set based approach would be better.

You could take two different approaches to this. What you do may depend on what you're able to do.

  • If you have control over how the data is loaded from the text file, use an ETL (Extract-Transform-Load) tool to split the columns and unpivot them. A single field containing other fields shouldn't ever make it into the database to begin with. Assuming you have access to SSIS and BIDS (or whatever Microsoft named it for your version of SQL server), you would set up a Flat File fixed width datasource and a pivot transformation task.



  • If you don't, you'll need to transform the data with an Un-Pivot Query after it's been loaded from the text file. That would look something like this.



(Note that it contains some setup code for my testing.)

CREATE TABLE #OneBigBucket(
    FieldOfFields char(80)
);

INSERT INTO #OneBigBucket
VALUES('                       1   2   2   4       4       2   2   1                    ');

-- The interesting stuff
SELECT SizeIndex,Units
FROM(
    SELECT 
        SUBSTRING(FieldOfFields,1,4) AS [1],
        SUBSTRING(FieldOfFields,5,4) AS [2],
        SUBSTRING(FieldOfFields,9,4) AS [3],
        SUBSTRING(FieldOfFields,13,4) AS [4],
        SUBSTRING(FieldOfFields,17,4) AS [5],
        SUBSTRING(FieldOfFields,21,4) AS [6],
        SUBSTRING(FieldOfFields,25,4) AS [7],
        SUBSTRING(FieldOfFields,29,4) AS [8],
        SUBSTRING(FieldOfFields,33,4) AS [9],
        SUBSTRING(FieldOfFields,37,4) AS [10],
        SUBSTRING(FieldOfFields,41,4) AS [11],
        SUBSTRING(FieldOfFields,45,4) AS [12],
        SUBSTRING(FieldOfFields,49,4) AS [13],
        SUBSTRING(FieldOfFields,53,4) AS [14],
        SUBSTRING(FieldOfFields,57,4) AS [15],
        SUBSTRING(FieldOfFields,61,4) AS [16],
        SUBSTRING(FieldOfFields,65,4) AS [17],
        SUBSTRING(FieldOfFields,69,4) AS [18],
        SUBSTRING(FieldOfFields,73,4) AS [19],
        SUBSTRING(FieldOfFields,77,4) AS [20]
    FROM #OneBigBucket) buckets
UNPIVOT 
    (Units FOR SizeIndex IN
        ([1],[2],[3],[4],[5],
        [6],[7],[8],[9],[10],
        [11],[12],[13],[14],[15],
        [16],[17],[18],[19],[20])
    ) AS unpvt
WHERE Units <> '';

DROP TABLE #OneBigBucket;


Is it pretty? No. Is it set based? Yes.

Code Snippets

CREATE TABLE #OneBigBucket(
    FieldOfFields char(80)
);

INSERT INTO #OneBigBucket
VALUES('                       1   2   2   4       4       2   2   1                    ');

-- The interesting stuff
SELECT SizeIndex,Units
FROM(
    SELECT 
        SUBSTRING(FieldOfFields,1,4) AS [1],
        SUBSTRING(FieldOfFields,5,4) AS [2],
        SUBSTRING(FieldOfFields,9,4) AS [3],
        SUBSTRING(FieldOfFields,13,4) AS [4],
        SUBSTRING(FieldOfFields,17,4) AS [5],
        SUBSTRING(FieldOfFields,21,4) AS [6],
        SUBSTRING(FieldOfFields,25,4) AS [7],
        SUBSTRING(FieldOfFields,29,4) AS [8],
        SUBSTRING(FieldOfFields,33,4) AS [9],
        SUBSTRING(FieldOfFields,37,4) AS [10],
        SUBSTRING(FieldOfFields,41,4) AS [11],
        SUBSTRING(FieldOfFields,45,4) AS [12],
        SUBSTRING(FieldOfFields,49,4) AS [13],
        SUBSTRING(FieldOfFields,53,4) AS [14],
        SUBSTRING(FieldOfFields,57,4) AS [15],
        SUBSTRING(FieldOfFields,61,4) AS [16],
        SUBSTRING(FieldOfFields,65,4) AS [17],
        SUBSTRING(FieldOfFields,69,4) AS [18],
        SUBSTRING(FieldOfFields,73,4) AS [19],
        SUBSTRING(FieldOfFields,77,4) AS [20]
    FROM #OneBigBucket) buckets
UNPIVOT 
    (Units FOR SizeIndex IN
        ([1],[2],[3],[4],[5],
        [6],[7],[8],[9],[10],
        [11],[12],[13],[14],[15],
        [16],[17],[18],[19],[20])
    ) AS unpvt
WHERE Units <> '';

DROP TABLE #OneBigBucket;

Context

StackExchange Code Review Q#57061, answer score: 4

Revisions (0)

No revisions yet.