debugkubernetesMinor
How to diagnose intermittent "error dialing backend: dial timeout" error in GKE
Viewed 0 times
backenderrorgkedialingdiagnosetimeouthowintermittentdial
Problem
We have an issue in our GKE cluster that started suddenly yesterday where intermittently we get errors interacting with the control plane. This affects our Jenkins builds which use the Kubernetes plugin, but also even simply running and attaching to pods. For example, if we do a
Hoping someone can help suggest how we can diagnose further.
Verbose logs from the
```
I0421 09:38:47.720435 2483 round_trippers.go:420] POST https://35.189.86.41/api/v1/namespaces/default/pods
I0421 09:38:47.720448 2483 round_trippers.go:427] Request Headers:
I0421 09:38:47.720452 2483 round_trippers.go:431] User-Agent: kubectl/v1.18.3 (darwin/amd64) kubernetes/2e7996e
I0421 09:38:47.720456 2483 round_trippers.go:431] Content-Type: application/json
I0421 09:38:47.720460 2483 round_trippers.go:431] Accept: application/json, /
I0421 09:38:47.852510 2483 round_trippers.go:446] Response Status: 201 Created in 132 milliseconds
I0421 09:38:47.860462 2483 reflector.go:175] Starting reflector *v1.Pod (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
I0421 09:38:47.860489 2483 reflector.go:211] Listing and watching *v1.Pod from k8s.io/client-go/tools/watch/informerwatcher.go:146
I0421 09:38:47.860610 2483 round_trippers.go:420] GET https://35.189.86.41/api/v1/namespaces/default/pods?fieldSelector=metadata.name%3Dsome-name&limit=500&resourceVersion=0
I0421 09:38:47.860619 2483 round_trippers.go:427] Request Headers:
I0421 09:38:47.860628 2483 round_trippers.go:431] Accept: application/json, /
I0421 09:38:47.860633 2483 round_trippers.go:431] User-Agent: kubectl/v1.18.3 (darwin/amd64) kubernetes/2e7996e
I0421 09:38:47.912301 2483 round_trippers.go:446] Response Status: 200 OK in 51 milliseconds
I0421 09:38:47.919196 24
kubectl run -i --tty --rm --image=busybox some-name, about 20% of the time we'll get the following error:Error attaching, falling back to logs: error dialing backend: dial timeoutHoping someone can help suggest how we can diagnose further.
Verbose logs from the
kubectl run:```
I0421 09:38:47.720435 2483 round_trippers.go:420] POST https://35.189.86.41/api/v1/namespaces/default/pods
I0421 09:38:47.720448 2483 round_trippers.go:427] Request Headers:
I0421 09:38:47.720452 2483 round_trippers.go:431] User-Agent: kubectl/v1.18.3 (darwin/amd64) kubernetes/2e7996e
I0421 09:38:47.720456 2483 round_trippers.go:431] Content-Type: application/json
I0421 09:38:47.720460 2483 round_trippers.go:431] Accept: application/json, /
I0421 09:38:47.852510 2483 round_trippers.go:446] Response Status: 201 Created in 132 milliseconds
I0421 09:38:47.860462 2483 reflector.go:175] Starting reflector *v1.Pod (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
I0421 09:38:47.860489 2483 reflector.go:211] Listing and watching *v1.Pod from k8s.io/client-go/tools/watch/informerwatcher.go:146
I0421 09:38:47.860610 2483 round_trippers.go:420] GET https://35.189.86.41/api/v1/namespaces/default/pods?fieldSelector=metadata.name%3Dsome-name&limit=500&resourceVersion=0
I0421 09:38:47.860619 2483 round_trippers.go:427] Request Headers:
I0421 09:38:47.860628 2483 round_trippers.go:431] Accept: application/json, /
I0421 09:38:47.860633 2483 round_trippers.go:431] User-Agent: kubectl/v1.18.3 (darwin/amd64) kubernetes/2e7996e
I0421 09:38:47.912301 2483 round_trippers.go:446] Response Status: 200 OK in 51 milliseconds
I0421 09:38:47.919196 24
Solution
Google Support tracked down the issue down for us:
this issue appears to have been caused by a known Issue affecting
Konnectivity, under some circumstances a memory leak could occur, in
the case of your cluster, this appears to end up causing
connection timeouts after some time, this issue has already been
addressed on the Konnectivity network proxy [1], and the fix is
already available within GKE on version 1.19.9 or higher.
[1] https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/168
this issue appears to have been caused by a known Issue affecting
Konnectivity, under some circumstances a memory leak could occur, in
the case of your cluster, this appears to end up causing
connection timeouts after some time, this issue has already been
addressed on the Konnectivity network proxy [1], and the fix is
already available within GKE on version 1.19.9 or higher.
[1] https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/168
Context
StackExchange DevOps Q#13751, answer score: 1
Revisions (0)
No revisions yet.