patterndockerMajor
Understanding Docker layers
Viewed 0 times
dockerunderstandinglayers
Problem
We have the following block in our
I've been told that we should unite these
I'm very new to docker and not sure I completely understand the differences between these two versions of specifying multiple RUN commands. When would one unite
Dockerfile:RUN yum -y update
RUN yum -y install epel-release
RUN yum -y groupinstall "Development Tools"
RUN yum -y install python-pip git mysql-devel libxml2-devel libxslt-devel python-devel openldap-devel libffi-devel openssl-develI've been told that we should unite these
RUN commands to cut down on created docker layers:RUN yum -y update \
&& yum -y install epel-release \
&& yum -y groupinstall "Development Tools" \
&& yum -y install python-pip git mysql-devel libxml2-devel libxslt-devel python-devel openldap-devel libffi-devel openssl-develI'm very new to docker and not sure I completely understand the differences between these two versions of specifying multiple RUN commands. When would one unite
RUN commands into a single one and when it makes sense to have multiple RUN commands?Solution
A docker image is actually a linked list of filesystem layers. Each instruction in a Dockerfile creates a filesystem layer that describes the differences in the filesystem before and after execution of the corresponding instruction. The
The number of layers used in an image is important
This has several consequences for how images should be built. The first and most important advice I can give is:
Advice #1 Make sure that the build steps where your source code is involved comes as late as possible in the Dockerfile and are not tied to previous commands using a
The reason for this, is that all the previous steps will be cached and the corresponding layers will not need to be downloaded over and over again. This means faster builds and faster releases, which is probably what you want. Interestingly enough, it is surprisingly hard to make optimal use of the docker cache.
My second advice is less important but I find it very useful from a maintenance view point:
Advice #2 Do not write complex commands in the Dockerfile but rather use scripts that are to be copied and executed.
A Dockerfile following this advice would look like
and so on. The advice of binding several commands with
People interested by pre-processors and willing to avoid the small overhead caused by the
sequences are replaced by
where the
My third advice is for people who wants to limit the size and the number of layers at the possible cost of longer builds.
Advice #3 Use the
A file added by some docker instruction and removed by some later instruction is not present in the resulting filesystem but it is mentioned two times in the docker layers constituting the docker image in construction. Once, with name and full content in the layer resulting from the instruction adding it, and once as a deletion notice in the layer resulting from the instruction removing it.
For instance, assume we temporarily need a C compiler and some image and consider the
(A more realistic example would build some software with the compiler instead of merely asserting its presence with the
The Dockerfile snippet creates three layers, the first one contains the full gcc suite so that even if it is not present in the final filesystem the corresponding data is still part of the image in same manner and need to be downloaded, uploaded and unpacked whenever the final image is.
The
Complex commands can be turned into function so that they can be fed to the
If we want to ignore Advice #2, the resulting Dockerfile snippet would be
which is not so easy to read and maintain because of the obfuscation. See how the shell-script variant outs emphasis on the important part
docker inspect subcommand can be used on a docker image to reveal its nature of being a linked list of filesystem layers.The number of layers used in an image is important
- when pushing or pulling images, as it affects the number of concurrent uploads or downloads occuring.
- when starting a container, as the layers are combined together to produce the filesystem used in the container; the more layers are involved, the worse the performance is, but the different filesystem backends are affected differently by this.
This has several consequences for how images should be built. The first and most important advice I can give is:
Advice #1 Make sure that the build steps where your source code is involved comes as late as possible in the Dockerfile and are not tied to previous commands using a
&& or a ;.The reason for this, is that all the previous steps will be cached and the corresponding layers will not need to be downloaded over and over again. This means faster builds and faster releases, which is probably what you want. Interestingly enough, it is surprisingly hard to make optimal use of the docker cache.
My second advice is less important but I find it very useful from a maintenance view point:
Advice #2 Do not write complex commands in the Dockerfile but rather use scripts that are to be copied and executed.
A Dockerfile following this advice would look like
COPY apt_setup.sh /root/
RUN sh -x /root/apt_setup.sh
COPY install_pacakges.sh /root/
RUN sh -x /root/install_packages.shand so on. The advice of binding several commands with
&& has only a limited scope. It is much easier to write with scripts, where you can use functions, etc. to avoid redundancy or for documentation purposes.People interested by pre-processors and willing to avoid the small overhead caused by the
COPY steps and are actually generating on-the-fly a Dockerfile where theCOPY apt_setup.sh /root/
RUN sh -x /root/apt_setup.shsequences are replaced by
RUN base64 --decode … | sh -xwhere the
… is the base64-encoded version of apt_setup.sh.My third advice is for people who wants to limit the size and the number of layers at the possible cost of longer builds.
Advice #3 Use the
with-idiom to avoid files present in intermediary layers but not in the resulting filesystem.A file added by some docker instruction and removed by some later instruction is not present in the resulting filesystem but it is mentioned two times in the docker layers constituting the docker image in construction. Once, with name and full content in the layer resulting from the instruction adding it, and once as a deletion notice in the layer resulting from the instruction removing it.
For instance, assume we temporarily need a C compiler and some image and consider the
# !!! THIS DISPLAYS SOME PROBLEM --- DO NOT USE !!!
RUN apt-get install -y gcc
RUN gcc --version
RUN apt-get --purge autoremove -y gcc(A more realistic example would build some software with the compiler instead of merely asserting its presence with the
--version flag.)The Dockerfile snippet creates three layers, the first one contains the full gcc suite so that even if it is not present in the final filesystem the corresponding data is still part of the image in same manner and need to be downloaded, uploaded and unpacked whenever the final image is.
The
with-idiom is a common form in functional programming to isolate resource ownership and resource releasing from the logic using it. It is easy to transpose this idiom to shell-scripting, and we can rephrase the previous commands as the following script, to be used with COPY & RUN as in Advice #2.# with_c_compiler SIMPLE-COMMAND
# Execute SIMPLE-COMMAND in a sub-shell with gcc being available.
with_c_compiler()
(
set -e
trap 'apt-get --purge autoremove -y gcc' EXIT
apt-get install -y gcc
"$@"
)
with_c_compiler\
gcc --versionComplex commands can be turned into function so that they can be fed to the
with_c_compiler. It is also possible to chain calls of several with_whatever functions, but maybe not very desirable. (Using more esoteric features of the shell, it is certainly possible to make the with_c_compiler accept complex commands, but it is in all aspects preferable to wrap these complex commands into functions.)If we want to ignore Advice #2, the resulting Dockerfile snippet would be
RUN apt-get install -y gcc\
&& gcc --version\
&& apt-get --purge autoremove -y gccwhich is not so easy to read and maintain because of the obfuscation. See how the shell-script variant outs emphasis on the important part
gcc --version while the chained-&& variant buries that part in the middCode Snippets
COPY apt_setup.sh /root/
RUN sh -x /root/apt_setup.sh
COPY install_pacakges.sh /root/
RUN sh -x /root/install_packages.shCOPY apt_setup.sh /root/
RUN sh -x /root/apt_setup.shRUN base64 --decode … | sh -x# !!! THIS DISPLAYS SOME PROBLEM --- DO NOT USE !!!
RUN apt-get install -y gcc
RUN gcc --version
RUN apt-get --purge autoremove -y gcc# with_c_compiler SIMPLE-COMMAND
# Execute SIMPLE-COMMAND in a sub-shell with gcc being available.
with_c_compiler()
(
set -e
trap 'apt-get --purge autoremove -y gcc' EXIT
apt-get install -y gcc
"$@"
)
with_c_compiler\
gcc --versionContext
StackExchange DevOps Q#1750, answer score: 41
Revisions (0)
No revisions yet.