HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Connection Timeout Expired without apparent network issue

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withoutapparentexpiredissuetimeoutnetworkconnection

Problem

We have one particular SQL Server which is intermittently timing out when accepting connections. The issue is consistent throughout the day, but occurs at a very low incidence. How can I continue to troubleshoot?


Connection Timeout Expired.  The timeout period elapsed while
attempting to consume the pre-login handshake acknowledgement.  This
could be because the pre-login handshake failed or the server was
unable to respond back in time.  The duration spent while attempting
to connect to this server was - [Pre-Login] initialization=0;
handshake=15002;  (Microsoft SQL Server, Error: -2)

Server Configuration:

  • SQL Server 2016 SP1 CU5 Enterprise (issue also occurred prior to SP1)



  • Windows Server 2012 R2 on both server and client



  • VMware ESXi, 6.5.0 on HP ProLiant DL360 Gen9



  • VM has 8 vCPU, 64 GiB of memory (fully reserved)



Test Script (executed once per second):

$failed = $false;
$loginDuration = (Measure-Command {
    $ncon = New-Object System.Data.SqlClient.SqlConnection `
        @( 'Data Source=1.2.3.4,16143;Database=Test;User=Test;Password=****;Pooling=false;' );
    try 
    {
        $ncon.Open();

        $cmd = New-Object System.Data.SqlClient.SqlCommand `
            @( 'SELECT @@VERSION', $ncon );
        $cmd.ExecuteNonQuery();

        $ncon.Dispose();
    }
    catch
    {
        $failed = $true;
    }
}).TotalMilliseconds;
Write-Metric -metric 'itp.dbserver.logintime' -unit 'milliseconds' `
    -value (&{if ($failed) { 120000 } else { $loginDuration }});


Observations:

  • Issue started occurring after OS Updates, SQL Server Updates, San move, and move from Hyper-V to VMWare



  • Most connections succeed (4 failures out of 1,440 attempts)



  • Failures are always listed with a low number in "[Pre-Login] initialization=0;" and a high number in "handshake=15002". We do not get errors like "Not found" or "No such host is known", only "Connection Timeout"



  • No encryption is enabled for the listener



  • Pings show no loss over

Solution

This was eventually identified as a side-effect of VMWare LRO. Disabling host-based LRO resolved the issue. See

  • Enable or Disable LRO on a VMXNET3 Adapter on a Windows Virtual Machine



  • Large Receive Offload



  • Poor network performance or high network latency on Windows virtual machines (2008925)



  • vmxnet3 adapter on windows server 2012 with MSSQL server bottleneck problem

Context

StackExchange Database Administrators Q#188255, answer score: 3

Revisions (0)

No revisions yet.