HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

transactional replication, failing with communication link failure half way through bcp

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
failingwithtransactionalcommunicationwayreplicationbcpthroughhalffailure

Problem

I am trying to replicate to a 4th subscriber on a remote site. 3 subscribers work fine locally. the other difference is this is a 2012 subscriber from a 2008 r2 publisher and remote distributor.

When delivering the bcp commands it fails with the following error.

The process could not bulk copy into table '"dbo"."FlightSchedule_tbl"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL0)

I have tried raising the historyverbose level to 2, but this is the only error i get from the agent history.. I have attempted to recreate the snapshot, but this still fails at the same point...

nothing unusual about the table, and its 5 million rows, so not massive amounts of data...

Column_name Type    Computed    Length
FlightId    int no  4
OrigAirport char    no  3
DestAirport char    no  3
DeptDate    datetime    no  8
ArriveDate  datetime    no  8
Airline char    no  10
SupplierId  int no  4
UpdateRef   int no  4
HarvestUpdateKey    int no  4
ActiveFlag  bit no  1


Can anyone help me diagnose this to get a useful error or suggest what the problem might be?? the connection is very slow to this datacentre which is a issue we are attempt to resolve... could this simply be a timeout ???

UPDATE - I configured a seperated publication with just this table in and set the verbose logging to 2. I then get

Error messages:

The process could not bulk copy into table '"dbo"."FlightSchedule_tbl"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL0)

An existing connection was forcibly closed by the remote host.

Unspecified error

Failed to send batch after max errors

TCP Provider: An existing connection was forcibly closed by the remote host. (Source: MSSQLServer, Error number: 10054)

Communication link failure (Source: MSSQLServer, Error number: 10054)

so looks like the network between datacentres... strange that it would always die at the same point, would there be a reason for this ?

The error log on the target server

Error: 4014, Severity: 20, State: 13

Solution

If the link is bad, you must overrride -QueryTimeOut and -KeepAliveMessageInterval parameters. Check the SQL Server Agent job, identify the replication Job, then right-click Properties, then Steps, Edit and at the end of Commend just add -LoginTimeOut 120 -QueryTimeOut 7200 -KeepAliveMessageInterval 600

Context

StackExchange Database Administrators Q#28117, answer score: 2

Revisions (0)

No revisions yet.