HiveBrain v1.2.0
Get Started
← Back to all entries
patternkubernetesMinor

Debug BackendConnectionErrors from Kubernetes Service LoadBalancer sesrvice

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
sesrvicedebugkubernetesbackendconnectionerrorsservicefromloadbalancer

Problem

We recently moved some of our production infrastructure to Kubernetes. Many pods are exposed through a LoadBalancer service on AWS. This creates an ELB, registers each node in the cluster with the ELB, and configures a node port to map ELB ports to pods. Our applications are able to connect via the load balancer, but amount of BackendConnectionErrors (as reported by cloudwatch) is 5-7x higher than request count. I'm not sure how to debug this.

The number of reported backend connection errors does not correlate with any application layer error metrics. This leaves me to conclude that it some sort of infrastructure problem perhaps being amplified by retries. However I do not know how to debug this issue.

My hypothesis is one or both of these:

  • Some weird AWS setting that is missing on the ELB for connection management



  • Nodes in the cluster have some sysctl setting or other networking config that's blocking the amount of connections coming over the ELB



  • Some intermediate piece of networking infrastructure messing with the connections.



My question is: how I can debug/trace some TCP/networking related metrics on the instances in the cluster?

More info about the CloudWatch metrics in question.

Solution

My solution to this problem was to rework my Services. The setup in my question had one K8s Service with ~10 ports. I reworked the setup to use one port per Service. The problem went away. I don't know why though. This makes me suspect something on the nodes themselves or some complexity in routing connections to the correct node port. I'm cautious of exposing too many ports again because of this.

Context

StackExchange DevOps Q#1266, answer score: 5

Revisions (0)

No revisions yet.