HiveBrain v1.2.0
Get Started
← Back to all entries
debugkubernetesMinor

service-to-pod communication is broken in kubernetes

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
kubernetescommunicationbrokenservicepod

Problem

I have an in house 5 node cluster running on bare-metal and I am using Calico. The cluster was working for 22 days but suddenly it stopped working. After investigating the problem I found out that the service to pod communication is broken while all the components are up and kubectl is working without a problem.

From within the cluster (component A) if I try to curl another component (bridge) with its IP it works:

$ curl -vvv http://10.4.130.184:9998
* Rebuilt URL to: http://10.4.130.184:9998/
*   Trying 10.4.130.184...
* TCP_NODELAY set
* Connected to 10.4.130.184 (10.4.130.184) port 9998 (#0)
> GET / HTTP/1.1
> Host: 10.4.130.184:9998
> User-Agent: curl/7.58.0
> Accept: */*
> 

    
    

    Bridge

    
    

    Bridge

* Connection #0 to host 10.4.130.184 left intact


ns lookup for the service is also working (it resolve to the service IP):

$ nslookup bridge
Server:    10.5.0.10
Address 1: 10.5.0.10 kube-dns.kube-system.svc.k8s.local

Name:      bridge
Address 1: 10.5.160.50 bridge.170.svc.k8s.local


But the service to pod communication is broken and when I curl to the service name most of the times (60-70%) it stuck:

$ curl -vvv http://bridge:9998
* Rebuilt URL to: http://bridge:9998/
* Could not resolve host: bridge
* Closing connection 0
curl: (6) Could not resolve host: bridge


When I check the endpoints of the that service I can see that the IP of that pod is there:

$ kubectl get ep -n 170 bridge
NAME     ENDPOINTS                                               AGE
bridge   10.4.130.184:9226,10.4.130.184:9998,10.4.130.184:9226   11d


But as I said the curl (and any other method) that uses the service name is not working. And this is the service description:

```
$ kubectl describe svc -n 170 bridge
Name: bridge
Namespace: 170
Labels: io.kompose.service=bridge
Annotations: Process: bridge
Selector: io.kompose.service=bridge
Type: ClusterIP
IP: 1

Solution

Sounds like the problem has been resolved by deleting all of the kube-proxy pods, which is an aggressive solution but if it works then good work!

Best I can offer you is a collection of links about diagnosing Kubernetes problems:

  • Troubleshoot Applications



  • Troubleshooting Kubernetes Networking Issues



  • Troubleshooting GKE (GKE Specific but still useful)



  • Deploy Kubespy



Not a great answer I'm afraid, but we may have missed the opportunity to diagnose the issue.

Context

StackExchange DevOps Q#9535, answer score: 1

Revisions (0)

No revisions yet.