HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

SSIS Lookup Component Irregularities

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lookupssisirregularitiescomponent

Problem

I'm seeing odd behaviour in a dataflow with a lookup operation. If I select 'full cache' mode, some portion of the lookups are failing and I have no explanation why. 'No Cache' works correctly, as does 'Partial Cache'. At this point, I cannot trust the 'Full Cache' in production if it fails mysteriously.

The expected result looks like this-



1 808 larry curly moe

2 808 larry curly moe

3 314 foo bar baz

4 314 foo bar baz

5 314 foo bar baz

6 314 foo bar baz



Full cache produces this instead-



1 808 larry curly moe

2 808 larry curly moe

3 314 foo bar baz

4 314 foo bar baz

5 314 null null null

6 314 null null null



The '314' example is an actual result. The first two records lookup correctly, the second two fail. If they were cache misses, they should fail for all four rows.

The failures are repeatable- the same items fail in the same sequence every run. The environment is an isolated test environment with three fixed databases that aren't changing between runs. Selecting Partial Cache is noticeably slower, implying that there's a significant number of cache misses. None of the tables are particularly large, a thousand rows or so.

What's going on? Should I just abandon the hope of ever using Full Cache?

Solution

I've seen this behavior as well. Mine was due to server memory, which I'll explain in the guidelines. This was pesky because it was intermittent.

With partial cache, the cache starts off empty then queries until it finds a match. If you have multiple matches, the first one wins. With full cache, if you have multiple matches, I'm not sure which would win. Probably the first in the cache order.

Partial cache has an option for miss cache, which will remember which records don't have matches and won't query them again. This will be a problem if you are inserting into the table that you are doing a lookup on. Also with full cache, if your source contains duplicates, the second won't get a match after the first is inserted which will be a problem if you want to suppress all but the first.

Here are a few guidelines that I try to follow when using lookups:

  • The lookup will never swap cache to disk. If it runs out of memory, the task will fail. If this is your issue, get more ram or try partial.



  • Use integers whenever possible. If the field is string but can be cast to int, do it. This remove case and white space issues all together



  • Trim strings. White space will give you fits.



  • If it is a string, upper case both sides before comparing (lookups are case sensitive and will fail if the cases don't match).



  • Null never equals Null. If the column is nullable, replace both sides with "UNKNOWN" if string and a irrational number like -999 if it is integer.



  • If the column is empty, replace both sides with the text "EMPTY"



These are a pain in the butt, but it's better than getting a call at 3am because a tier 1 customers ETL failed.

Context

StackExchange Database Administrators Q#20157, answer score: 4

Revisions (0)

No revisions yet.