HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

AWS DMS Task getting failed after completion with error as AlwaysOn BACKUP-ed data is not available

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
aftercompletionerroravailablewithnotgettingdmsfailedaws

Problem

I have created DMS task for migrating data from one RDS MS SQL Server instance to another RDS MS SQL Server instance with "Full load + ongoing replication", and data copied to target instance but status showing as failed with below error -

Last Error AlwaysOn BACKUP-ed data is not available Task error
notification received from subtask 0, thread 0
[reptask/replicationtask.c:2822] [1020465] Error executing source
loop; Stream component failed at subtask 0, component
st_0_NJZW5VSVPASASA4E4N2SJGTVEZ3UIDLJHX3NDY; Stream component
'st_0_NJZW5VSVASASAZ4E4N2SJGTVEZ3UIDLJHX3NDY' terminated
[reptask/replicationtask.c:2829] [1020465] Stop Reason FATAL_ERROR
Error Level FATAL

Cloud watch logs:

2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Failed to access LSN '0000009a:00000087:0010' in the backup log sets since BACKUP/LOG-s are not available. [1020465] (sqlserver_endpoint_capture.c:717)
2021-01-04T15:24:02 [TASK_MANAGER ]I: Task - PHTLF2WXLFVWESX4GLZ2CKAW3MY76HLRXHIYYYY is in ERROR state, updating starting status to AR_NOT_APPLICABLE (repository.c:5103)
2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Error executing source loop [1020465] (streamcomponent.c:1867)
2021-01-04T15:24:02 [TASK_MANAGER ]E: Stream component failed at subtask 0, component st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY [1020465] (subtask.c:1409)
2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Stream component 'st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY' terminated [1020465] (subtask.c:1578)
2021-01-04T15:24:02 [TASK_MANAGER ]E: Task error notification received from subtask 0, thread 0 [1020465] (replicationtask.c:2822)
2021-01-04T15:24:02 [TASK_MANAGER ]E: Error executing source loop; Stream component failed at subtask 0, component st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY; Stream component 'st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY' terminated [1020465] (replicationtask.c:2829)
2021-01-04T15:24:02 [TASK_MANAGER ]E: Task 'PHTLF2WXLFVWESX4GLZ2CKAW3MY76HLRXHIYYYY' encountered a fatal

Solution

I don't know if you're still working through this, but I wanted to post the problem & solution for anyone else.

There is a bug in the AWS DMS replication engine version 3.4.3 when using RDS SQL Server as a source endpoint, where the 5 minute log backup used for point-in-time recovery with RDS disrupts the CDC process. The solution/workaround is to use a different replication engine version set on the replication instance, such as v3.4.2.

From AWS support:

Root Cause:

Researching internally, I observed this error to be a know issue with
DMS replication instance running the version 3.4.3,
where RDS SQL Server has truncated active transaction logs (TLOG) or
there are no activities in the database. This is a known situation
when using RDS SQL Server as a source.
Why task failed:

DMS 3.4.3 tasks are failing due to missing LSNs in source RDS SQL
Sever databases, when source SQL Server MS-CDC enabled tables remains
idle for a long period of time, the DMS task still fails even when you
increase the MS-CDC polling interval on source database.

This is because MS-CDC polling interval prevents TLOGs truncation only
when there are changes on source tables, if source tables remains
idle, the LSNs does not increment, then the RDS t-log backup job
removes the LSNs from the online t-log that are required by DMS,
causing this issue.

Additionally, DMS task is failing with misleading errors such as:

00016458: 2020-12-28T09:30:08 [SOURCE CAPTURE ]E: Failed to access LSN
'00001002:0001c389:0003' in the backup log sets since BACKUP/LOG-s are
not available. [1020465] (sqlserverendpoint_capture.c:717

This error can still occur even when MS-CDC Polling interval is set to
sufficient value, it happens when the RDS t-log backup job removes an
LSN from the online t-log. This issue is especially seen in databases
with less traffic since the LSNs are not incrementing and DMS is
looking at the last committed LSN.
Workaround:

I would recommend trying to migrate the same tables by creating a
similar task using DMS RI version 3.4.2 or 3.3.4

Context

StackExchange Database Administrators Q#282508, answer score: 3

Revisions (0)

No revisions yet.