patterndockerMinor
How would you build docker images in Kubernetes while distributing its layer caching among all pod builders at scale?
Viewed 0 times
layeramongwhilebuildersyoudistributingdockerkubernetesallcaching
Problem
We are using Azure DevOps as our CICD platform.
We have our own self hosted Linux agents which we integrated into our CICD Azure Kubernetes Cluster .
We use these agents to build our Docker images within the cluster, meaning that we have N agents represented by N pods.
While working, we might scale up and down the amount of concurrent agents depending on different parameters.
Since we have started with this approach, we have encountered a huge issue in leveraging docker's cache.
We have tried to:
Currently we are a bit confused as we assume that this is a common industry issue and therefore we might have got it wrong. We are looking to understand how to leverage docker's cache while using our current setup and without moving to another platform.
How should we do that? What are the best practices for this issue?
We have our own self hosted Linux agents which we integrated into our CICD Azure Kubernetes Cluster .
We use these agents to build our Docker images within the cluster, meaning that we have N agents represented by N pods.
While working, we might scale up and down the amount of concurrent agents depending on different parameters.
Since we have started with this approach, we have encountered a huge issue in leveraging docker's cache.
We have tried to:
- Use kubernetes StatefulSets. But this limits us from having the cache shared between all pods (one PV/PVC per pod)
- Use Google's Kaniko - Unfortunately we have found that in many cases it is slower than just using Docker's BuildKit without cache.
- We tried to use Azure Files with SMB and NFS as we thought that a PV with ReadWriteMany will solve the problem. Unfortunately we have ended up with a fatal error.
Currently we are a bit confused as we assume that this is a common industry issue and therefore we might have got it wrong. We are looking to understand how to leverage docker's cache while using our current setup and without moving to another platform.
How should we do that? What are the best practices for this issue?
Solution
Docker allows you to pull a cache from anywhere with a --cache-from command. See dockumentation here: https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources
The easiest approach is to build your images with the
If you want to store the files somewhere closer you can attach volumes directly to the nodes using pvc's or shared volumes. See this stack overflow on more details. https://stackoverflow.com/a/52564314/8954290.
Here is someone who uses the same approach. https://medium.com/swlh/fast-docker-build-in-kubernetes-f52088854f45
NOTE: The reason that CI providers do this is to do this reliably is hard. You must handle race conditions of which images to store as cache and which ones to pull from in the event of multiple images being built at the same time. CI providers use workflow engines, queus, replicas etc to handle this which is not easy to build. This is why most people use services like circle ci, codefresh etc.
The easiest approach is to build your images with the
--build-arg BUILDKIT_INLINE_CACHE=1 flag which stores the metadata needed to build from cache. You can store these images in two places, either a file system or a registry you already have. If you want to keep the registry of images used for deployment separate from the ones with metadata, you could separate with tags or separate registries.If you want to store the files somewhere closer you can attach volumes directly to the nodes using pvc's or shared volumes. See this stack overflow on more details. https://stackoverflow.com/a/52564314/8954290.
Here is someone who uses the same approach. https://medium.com/swlh/fast-docker-build-in-kubernetes-f52088854f45
NOTE: The reason that CI providers do this is to do this reliably is hard. You must handle race conditions of which images to store as cache and which ones to pull from in the event of multiple images being built at the same time. CI providers use workflow engines, queus, replicas etc to handle this which is not easy to build. This is why most people use services like circle ci, codefresh etc.
Context
StackExchange DevOps Q#14104, answer score: 1
Revisions (0)
No revisions yet.