patternsqlMinor
Chance of skipping extended events when using an offset for the file target
Viewed 0 times
thefiletargeteventsoffsetchanceextendedskippingusingfor
Problem
I am exploring extended events and came across a question I can't solve using the Microsoft documentation:
When reading events from the file target we can specify and initial offset so we can read the logs in in a differential manner. For example:
Now all events up to and including 984576 will be ignored. Events after the offset will be returned.
From the documentation:
initial_offset
Used to specify last offset read previously and skips all events up to the offset (inclusive). Event enumeration starts after the offset specified. initial_offset is bigint. If null is specified as the argument the entire file will be read.
My question is how can we be sure that we don't skip events given the fact that file_offset is not unique and shared:
If we use an offset there is a change that new events with a share offset are not returned right?
Does the function fn_xe_file_target_read_file bump the offset so that this possibility cannot occur?
When reading events from the file target we can specify and initial offset so we can read the logs in in a differential manner. For example:
sys.fn_xe_file_target_read_file('log*.xel', 'log*.xem', 'specific.xel', 984576)Now all events up to and including 984576 will be ignored. Events after the offset will be returned.
From the documentation:
initial_offset
Used to specify last offset read previously and skips all events up to the offset (inclusive). Event enumeration starts after the offset specified. initial_offset is bigint. If null is specified as the argument the entire file will be read.
My question is how can we be sure that we don't skip events given the fact that file_offset is not unique and shared:
If we use an offset there is a change that new events with a share offset are not returned right?
Does the function fn_xe_file_target_read_file bump the offset so that this possibility cannot occur?
Solution
I couldn't find a definitive answer about how the offset is generated, only one unreliable reference that they're a block number. A little disappointing, I guess the engineers don't use stack exchange.
However I was listening to the SQL Down Under podcast this week and they were discussing the issues with doing ETL based on timestamps which are a little unreliable and result in skipped records.
The advice was that it's a good practice to read 30s back from where you left off and then eliminate duplicates.
So unless someone comes up with something better you can at least address your missed record concerns by starting your reads one block prior to the last block you received, and then eliminating duplicates.
However I was listening to the SQL Down Under podcast this week and they were discussing the issues with doing ETL based on timestamps which are a little unreliable and result in skipped records.
The advice was that it's a good practice to read 30s back from where you left off and then eliminate duplicates.
So unless someone comes up with something better you can at least address your missed record concerns by starting your reads one block prior to the last block you received, and then eliminating duplicates.
Context
StackExchange Database Administrators Q#136729, answer score: 2
Revisions (0)
No revisions yet.