HiveBrain v1.2.0
Get Started
← Back to all entries
debugdockerMinor

Kubernetes fails to do do garbage collection on images

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
failskubernetescollectiongarbageimages

Problem

For quiet some time every once in a while I get an alert, that the disk of one of my Kubernetes cluster nodes is filling up. I pretty quickly found out, that it were the Docker images. As I did not have time to deeper analyze the issue, I just used docker system prune to get rid of everything not running (yeah, I know this is highly discouraged).

Now I started to look into this issue and it seems, that the garbage collector has issues with the images:
max@nb [~]
-> % k get events -owide
LAST SEEN TYPE REASON OBJECT SUBOBJECT SOURCE MESSAGE FIRST SEEN COUNT NAME
3m15s Warning ImageGCFailed node/cluster00 kubelet, cluster00 failed to get imageFs info: non-existent label "docker-images" 69d 20073 cluster00.16fa70fb01b64a69
4m15s Warning ImageGCFailed node/cluster02 kubelet, cluster02 failed to get imageFs info: non-existent label "docker-images" 69d 20085 cluster02.16fa6da5f5b30c31
54s Warning ImageGCFailed node/cluster04 kubelet, cluster04 failed to get imageFs info: non-existent label "docker-images" 69d 20082 cluster04.16fa6ea64df694c5
3m57s Warning ImageGCFailed node/cluster05 kubelet, cluster05 failed to get imageFs info: non-existent label "docker-images" 69d 20087 cluster05.16fa6d1f167fe7f4
48s Warning ImageGCFailed node/cluster06 kubelet, cluster06 failed to get imageFs info: non-existent label "docker-images" 69d 20077 cluster06.16fa700540142542
2m21s Warning ImageGCFailed node/cluster11 kubelet, cluster11 failed to get imageFs info: non-existent label "docker-images" 69d 20074 cluster11.16fa70c17857fef8
max@nb [~]
-> %


I have two other nodes in the cluster, that are newer than the others and do no

Solution

I think I fixed the issue.

It seems it comes up, when the kubelet service starts before the docker service is up and running. So I added the directive After=docker.service to the [Unit] block of the kubelet service and restarted it. The issue seems to be gone now.

Context

StackExchange DevOps Q#16520, answer score: 1

Revisions (0)

No revisions yet.