snippetMinor
How to handle schema changes in source systems when running an ETL?
Viewed 0 times
handlesourcesystemsrunningchangeshowetlwhenschema
Problem
How do people typically handle schema changes in source systems when running an ETL as part of an EDW? For example when a column you were operating on disappears and your ETL breaks.
Solution
How do people typically handle schema changes in source systems when running an ETL as part of an EDW? For example when a column you were operating on disappears and your ETL breaks.
I would like to answer this question on the same abstraction level it was asked.
Using conceptual object model of Informatica
Table represented by Source object is used in a Mapping, which enumerates all required Source columns (Ports). When the Mapping is executed by Session (as a part of a Workflow), the Mapping will throw runtime error about missing source Port.
Using the ETL subsystem model of Kimball
Take a look at The 38 Subsysems of ETL by Kimball. Whatever ETL tools you use, you will have following subsystems which play a role in adressing your "variating source" issue:
I would like to answer this question on the same abstraction level it was asked.
Using conceptual object model of Informatica
Table represented by Source object is used in a Mapping, which enumerates all required Source columns (Ports). When the Mapping is executed by Session (as a part of a Workflow), the Mapping will throw runtime error about missing source Port.
Using the ETL subsystem model of Kimball
Take a look at The 38 Subsysems of ETL by Kimball. Whatever ETL tools you use, you will have following subsystems which play a role in adressing your "variating source" issue:
- Extract Subsystem should catch runtime errors.
SELECT *is almost never a good idea. If data source changed in a way that caused some columns/tables to disappear, the error / warning should be triggered - and you do need to log/track all errors. See next one.
- Error Event Tracking. - aka Logging system. It's crucial, but sometimes ignored by in-house ETL solutions. All the components of ETL should be able to log errors/warnings in uniform way, on different levels of severity. And the messages should be propagated to the responsible ETL manager person. Bare minimum: flat, syslog-like logfiles plus some script to classify & email errors.
- Job Scheduler and/or Workflow Monitor. Obviously you will run automated and/or manual ETL jobs, and you should be able to get notified about errors - as well as see current and historical job results including status, rowcounts, warnings of recent jobs.
- Problem Escalation.. You need to have an effective path of problem escalation and it's not a technical, but organizational issue. Technically, it's often enough to send an email titled "EDW workflow 001 failed with FATAL error (log attached)". The key is a responsible person and a business process which will push this up.
Context
StackExchange Database Administrators Q#232772, answer score: 3
Revisions (0)
No revisions yet.