HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimize SSIS package for fewer queries

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
packagessisforoptimizequeriesfewer

Problem

I need to associate customers' order with their "level" (Silver, Gold, etc.) when they placed the order:

CRM server::CRM db::CRM table
----------
CustomerID   PreviousLevel    NewLevel    NewLevelGrantedOn

Order server::Order db::Order table
----------
OrderID     CustomerID    OrderPlacedOn


In an SSIS package I did this:

  • Execute SQL against Order db to extract the orders and put them in an object variable;



  • Loop through each order using "foreach container", in which I put a data flow task `select top 1 * where CustomerID = ? and LevelGrantedOn



There are more then twenty thousand records in the order db, which means the data flow task will be executed for more than twenty thousand times. The CRM db will be queried for more than twenty thousand times, too. It takes more than an hour to do these.

Can I utilize some built-in features to speed up these (or do it in a "smart" way)? And, is an hour a long time, in the context of ETL and / or SSIS?

Solution

You can achieve this without using Foreach enumerator, just use one data flow task to achieve this.

Building the Package

First of all, add a Data Flow Task to the control Flow

  1. OLEDB Sources



In the DataFlow task add an OLEDB Source that Read from Orders Table (the same command used in the Execute SQL Task (first step in your question)

Select * FROM Order


Also add a second OLEDB Source that read from customer table:

select * FROM CRM


  1. Sort



After each OLEDB Source add a source component:

  • The First (order table) you have to select CustomerID and OrderID columns for sorting and select the sort type as ascending



  • The Second (customer table) you have to select CustomerID (sort type


= ascending) and LevelGrantedOn (sort type
= descending) columns for sorting

  1. Merge Join



Add a merge Join component to join both sorted outputs. and select CustomerID column from both outputs as Join Key and select the output columns you need from both tables

  1. Conditional Split



After merge Join add a conditional split to filter rows only that match the following expression

[LevelGrantedOn] < [orderdate]


  1. Script Component



The final component we have to use is a script component to get only the first row for each customerID (because both source are well sorted just grabbing the first row is similar for Select top 1 ... ORDER BY LevelGrantedOn desc)

In the script component add an Output Column OutFlag of type DT_BOOL and Use the following script:

This script will set OutFlag to True when the customerID occur for the first time (similar to TOP 1)

Public Class ScriptMain
        Inherits UserComponent

        Dim lstCustomerID As New System.Collections.Generic.List(Of Integer)
        Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

            If lstCustomerID.Contains(Row.ID) Then

                Row.OutFlag = False
            Else

                lstCustomerID.Add(Row.ID)
                Row.OutFlag = True

            End If

        End Sub

    End Class


  1. Conditional Split



Add a second Conditional Split to filter Top 1 rows:

[OutFlag] == True


7.OLEDB Destination

Connect the Conditional Split Output top the OLEDB Destination and map the columns.

The Dataflow task must looks like

Code Snippets

Select * FROM Order
select * FROM CRM
[LevelGrantedOn] < [orderdate]
Public Class ScriptMain
        Inherits UserComponent


        Dim lstCustomerID As New System.Collections.Generic.List(Of Integer)
        Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

            If lstCustomerID.Contains(Row.ID) Then

                Row.OutFlag = False
            Else

                lstCustomerID.Add(Row.ID)
                Row.OutFlag = True


            End If



        End Sub

    End Class
[OutFlag] == True

Context

StackExchange Database Administrators Q#176886, answer score: 3

Revisions (0)

No revisions yet.