patternsqlMinor
Availability Group reporting disconnected replica
Viewed 0 times
reportinggroupreplicaavailabilitydisconnected
Problem
I'm trying to configure availability groups in a VM environment so I can run some tests.
I think I've got the group created correctly, I can see the Always on group on both servers. However when I look at the dashboard for the group it has the following error
Availability replica disconnected
This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.
I've checked the endpoints on both servers and they look correct. There are no firewalls running and both servers can see each other. What's the best way to debug this sort of error?
Below is the TSQL I used to set all this up
Primary Server
Secondary Server
Primary Server
Secondary Server
Obviously I also restored the primary database to the secondary server as well.
One thought, I didn't install the SQL Agent on either server, I'm guessing this is not needed for always on availability groups?
I think I've got the group created correctly, I can see the Always on group on both servers. However when I look at the dashboard for the group it has the following error
Availability replica disconnected
This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.
I've checked the endpoints on both servers and they look correct. There are no firewalls running and both servers can see each other. What's the best way to debug this sort of error?
Below is the TSQL I used to set all this up
Primary Server
CREATE ENDPOINT dbm_endpoint
STATE=STARTED
AS TCP (LISTENER_PORT=7022)
FOR DATABASE_MIRRORING (ROLE=ALL)
GOSecondary Server
CREATE ENDPOINT dbm_endpoint
STATE=STARTED
AS TCP (LISTENER_PORT=5022)
FOR DATABASE_MIRRORING (ROLE=ALL)
GOPrimary Server
CREATE AVAILABILITY GROUP AG1
FOR
DATABASE TestDb
REPLICA ON
'SQL1' WITH
(
ENDPOINT_URL = 'TCP://sql1.sql.sandbox.net:7022',
PRIMARY_ROLE ( ALLOW_CONNECTIONS = READ_WRITE),
SECONDARY_ROLE (ALLOW_CONNECTIONS=READ_ONLY),
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL
),
'SQL2' WITH
(
ENDPOINT_URL = 'TCP://sql2.sql.sandbox.net:5022',
PRIMARY_ROLE ( ALLOW_CONNECTIONS = READ_WRITE),
SECONDARY_ROLE (ALLOW_CONNECTIONS=READ_ONLY),
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL
);Secondary Server
ALTER AVAILABILITY GROUP AG1 JOIN;Obviously I also restored the primary database to the secondary server as well.
One thought, I didn't install the SQL Agent on either server, I'm guessing this is not needed for always on availability groups?
Solution
This question is very old and likely no more troubleshooting will be possible, however for those of you stumbling upon this question having the same issue the following may be of some help. This isn't meant to be a complete list of all issues that could occur, but it will hit the main ones that do.
When the secondary replica is marked as disconnected by the primary, this means the primary hasn't heard or can't contact the secondary replica over the database mirroring endpoint. There are various reasons for this:
Endpoint configuration
One of the most common causes are misconfigured endpoints for the Database Mirroring role. There are various options and configurations that can be used and sometimes, especially when created by hand, those configuration options do not match. Take the OPs example:
This creates an endpoint called dbm_endpoint on tcp port 5022 for database mirroring. However it leaves out quite a bit of configuration options to which we assume are defaulted but we really do not know. This means the default options such as
That is quite the assumption to make, especially when it isn't working. If you're not sure what the endpoint is set to, join
There are advanced configurations, for example, that use certificates since they can't use
When specifying the IP Address, the default is
Availability Group Replica Specification
When creating the availability group each replica will need to be specified and as such will need an endpoint url which will define the communications port (database mirroring endpoint).
Taking the OPs example:
This replica, called SQL1 has a TCP endpoint on sql1.sql.sandbox.net port 7022. the address seems to be a FQDN and should be checked that the proper DNS lookups, (if using) Kerberos SPNs, and firewall rules exist or are created. If there are multiple interfaces or addresses that are returned on a lookup of the address, then it may make sense to narrow it down to a specific address in the endpoint url for testing or to narrow it down in the mirroring endpoint specification.
SQL Server Permissions
One of the last surface areas for the disconnected issues to appear is the endpoint permissions on each replica. Since each endpoint the account will need the
Windows Clustering
Since the replica manager replies upon Windows Server Failover Clustering to be working (not counting Linux, Docker, or Read-Scale) in order to start up, do make sure the cluster service is running on each replica and each node shows an up status in the cluster (Powershell, Failover Cluster Manager, etc.). If the cluster service is not running, then the manager will stop until it comes back up - this will be logged in the errorlog and should be very evident.
Troubleshooting Methodology
When the secondary replica is marked as disconnected by the primary, this means the primary hasn't heard or can't contact the secondary replica over the database mirroring endpoint. There are various reasons for this:
Endpoint configuration
One of the most common causes are misconfigured endpoints for the Database Mirroring role. There are various options and configurations that can be used and sometimes, especially when created by hand, those configuration options do not match. Take the OPs example:
CREATE ENDPOINT dbm_endpoint
STATE=STARTED
AS TCP (LISTENER_PORT=5022)
FOR DATABASE_MIRRORING (ROLE=ALL)This creates an endpoint called dbm_endpoint on tcp port 5022 for database mirroring. However it leaves out quite a bit of configuration options to which we assume are defaulted but we really do not know. This means the default options such as
Windows authentication using either Kerberos or NTLM, it also doesn't specify which IP address (v4, v6, interface) is used, nor does it specify encryption options to which the default is REQUIRED and depending on the version of SQL Serve either RC4 or AES.That is quite the assumption to make, especially when it isn't working. If you're not sure what the endpoint is set to, join
sys.tcp_endpoints with sys.database_mirroring_endpoints to turn the resulting single row to individual configuration options.There are advanced configurations, for example, that use certificates since they can't use
Windows authentication. We don't know if the OP has a proper domain and if these servers are joined. Again, the assumption based off the endpoints is that they are, however we don't know for sure. When using certificates the largest issue is that the same certificate is used on multiple replicas yet the private key is not exported when copying the certificate to import to the replicas. Since the endpoints use the private/public key pair, using the same certificate will require that endpoint at some time to use the private key. Make sure the certificate is exported and imported with both keys if you're using it on multiple replicas.When specifying the IP Address, the default is
ANY which could be any of the IPv4 or IPv6 addresses. Though most places don't register or use IPv6 addresses, I do see it as an issue every once and a while. Checking to see what addresses are registered to a host in your DNS can help narrow the issue. You can also setup a network capture and filter based on the port. Every once and a while an IPv6 address is used and it's improperly setup, disabled, or firewalled on other servers but is returned via DNS.Availability Group Replica Specification
When creating the availability group each replica will need to be specified and as such will need an endpoint url which will define the communications port (database mirroring endpoint).
Taking the OPs example:
REPLICA ON
'SQL1' WITH
(
ENDPOINT_URL = 'TCP://sql1.sql.sandbox.net:7022',
PRIMARY_ROLE ( ALLOW_CONNECTIONS = READ_WRITE),
SECONDARY_ROLE (ALLOW_CONNECTIONS=READ_ONLY),
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL
),This replica, called SQL1 has a TCP endpoint on sql1.sql.sandbox.net port 7022. the address seems to be a FQDN and should be checked that the proper DNS lookups, (if using) Kerberos SPNs, and firewall rules exist or are created. If there are multiple interfaces or addresses that are returned on a lookup of the address, then it may make sense to narrow it down to a specific address in the endpoint url for testing or to narrow it down in the mirroring endpoint specification.
SQL Server Permissions
One of the last surface areas for the disconnected issues to appear is the endpoint permissions on each replica. Since each endpoint the account will need the
CONNECT permission (similar if using certificates) in order to use the endpoints. This is a fairly common issue, so common that this is marked in the errorlog and gives you the fix in the error message which is to grant the connect permission to the endpoint.Windows Clustering
Since the replica manager replies upon Windows Server Failover Clustering to be working (not counting Linux, Docker, or Read-Scale) in order to start up, do make sure the cluster service is running on each replica and each node shows an up status in the cluster (Powershell, Failover Cluster Manager, etc.). If the cluster service is not running, then the manager will stop until it comes back up - this will be logged in the errorlog and should be very evident.
Troubleshooting Methodology
Code Snippets
CREATE ENDPOINT dbm_endpoint
STATE=STARTED
AS TCP (LISTENER_PORT=5022)
FOR DATABASE_MIRRORING (ROLE=ALL)REPLICA ON
'SQL1' WITH
(
ENDPOINT_URL = 'TCP://sql1.sql.sandbox.net:7022',
PRIMARY_ROLE ( ALLOW_CONNECTIONS = READ_WRITE),
SECONDARY_ROLE (ALLOW_CONNECTIONS=READ_ONLY),
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL
),Context
StackExchange Database Administrators Q#27549, answer score: 5
Revisions (0)
No revisions yet.