HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Normalizing Invoices

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
normalizinginvoicesstackoverflow

Problem

I need to move data from one database to another, and since I don't have SSIS I'm doing this ETL with T-SQL scripts.

One of the source tables contains invoice details, and features a column that contains the number of units invoiced per size (it's clothing); as I transfer the data over to my database, I'm normalizing this information. The query works fine, the execution plan looks exactly as I expected, doesn't recommend adding any indexes, ...but it takes about 10 minutes to process.

The source table Staging.dbo.[SourceTable] contains about 780K rows; the query inserts 1.5M rows into the destination table [DestinationDatabase].dbo.InvoiceDetailSizes.

The query is scheduled to run daily, as part of an overnight process - the 10 minutes don't really matter, but still I want to be sure everything is as efficient as it can be.

```
with cteSizedInvoices (
InvoiceNumber, InvoiceLine, SizeRangeCode, UnitsPerSize
)
as (select
src.f2,
cast(src.f5 as int),
src.f23,
src.f19
from Staging.dbo.[SourceTable] src
where src.f1 = '01'
)

--insert into [DestinationDatabase].dbo.InvoiceDetailSizes (InvoiceDetailId, SizeId, Units, DateInserted)
select
detail.Id InvoiceDetailId,
sz.Id SizeId,
buckets.Units,
getdate()
from
cteSizedInvoices src
inner join [DestinationDatabase].dbo.InvoiceHeaders header on src.InvoiceNumber = header.Number
inner join [DestinationDatabase].dbo.InvoiceDetails detail on src.InvoiceLine = detail.LineNumber
and detail.InvoiceHeaderId = header.Id
inner join [DestinationDatabase].dbo.SizeRanges ranges on src.SizeRangeCode = ranges.Code
cross apply (select id SizeIndex, cast(item as int) Units from [DestinationDatabase].dbo.BucketString(src.UnitsPerSize, 5, 1, 1)) buckets
inner join [DestinationDatabase].dbo.Sizes sz on buckets.SizeIndex = sz.SizeRangeIndex

Solution

You use the value of (@bucketSize + @bufferSize) inside of a where clause and to calculate a value in a select statement, I think if you took that and created it's own variable, that it might give you some performance increase. as well as some of the other arithmetic that could be done from the start and not calculated during the actual query

len(@values)/(@bucketSize+@bufferSize)+1


could be declared from the beginning

DECLARE maxID
SET maxID = LEN(@values) / (@bucketSize + @bufferSize) + 1


After you do that it is a toss up whether or not to Declare another Variable for the Addition of the @bucketSize + @bufferSize, but it might be worth a try.

Also please use some spaces in your equations, you use plenty of white space everywhere else.

I have a minute or two so I will also say this...

since this is going over so many records and that simple Variable freed up a good 10 seconds, I think that it might be worth it to move as much static calculation out of the query as possible, I know this is less about what SQL was meant for, but I also think this is more about being Dynamic as well.

Maybe I am going a little too far here, but this should still do the same thing as the original

create function [dbo].[BucketString] (
    @values varchar(max),
    @bucketSize int,
    @bufferSize int = 1,
    @offset int = 0)

returns @result table (id int, item varchar(max))

begin

    DECLARE @bucketAndBufferSize
    DECLARE @maxID
    DECLARE @bufferMinusBufferMinusOne
    DECLARE @bufferMinusOne

    SET @bucketAndBufferSize = @bucketSize+@bufferSize
    SET @maxID = LEN(@values) / (@bucketAndBufferSize) + 1
    SET @bufferMinusBufferMinusOne = @bufferSize - (@bufferSize - 1) 
    SET @bufferMinusOne = @bufferSize - 1

    with buckets as
    (   
        select 1 id

        union all

        select t.id + 1
        from buckets t
        where id = t.id 
            and t.id < @maxID
    )

    insert into @result
    select 
        id, 
        substring(@values, @offset + ((id - 1) * @bucketAndBufferSize + (case when @bufferMinusOne = 0 then 1 else @bufferMinusBufferMinusOne end)), @bucketSize) string
    from buckets    
    option (maxrecursion 0)

    return;

end


I apologize for the mismatched casing, it's habit to capitalized those words...

When bufferSize is unchanged you want it to be 1 so let's just do that and take out the extra call to the variable.

I took out all the arithmetic that wasn't reliant on information from the query and made them their own variable, this isn't going to be a huge difference, but I think it will be faster than doing the arithmetic inside the query itself, it's like being distracted by that red ball all the time, it distracts you momentarily but you get the job done.

Code Snippets

len(@values)/(@bucketSize+@bufferSize)+1
DECLARE maxID
SET maxID = LEN(@values) / (@bucketSize + @bufferSize) + 1
create function [dbo].[BucketString] (
    @values varchar(max),
    @bucketSize int,
    @bufferSize int = 1,
    @offset int = 0)

returns @result table (id int, item varchar(max))

begin

    DECLARE @bucketAndBufferSize
    DECLARE @maxID
    DECLARE @bufferMinusBufferMinusOne
    DECLARE @bufferMinusOne

    SET @bucketAndBufferSize = @bucketSize+@bufferSize
    SET @maxID = LEN(@values) / (@bucketAndBufferSize) + 1
    SET @bufferMinusBufferMinusOne = @bufferSize - (@bufferSize - 1) 
    SET @bufferMinusOne = @bufferSize - 1

    with buckets as
    (   
        select 1 id

        union all

        select t.id + 1
        from buckets t
        where id = t.id 
            and t.id < @maxID
    )

    insert into @result
    select 
        id, 
        substring(@values, @offset + ((id - 1) * @bucketAndBufferSize + (case when @bufferMinusOne = 0 then 1 else @bufferMinusBufferMinusOne end)), @bucketSize) string
    from buckets    
    option (maxrecursion 0)

    return;

end

Context

StackExchange Code Review Q#64165, answer score: 5

Revisions (0)

No revisions yet.