patternMinor
Running the docker daemon without root
Viewed 0 times
withoutthedockerrunningrootdaemon
Problem
Inspired by What are best and comprehensive practices to consider when running docker in production? , I stumbled over Why we don't let non-root users run Docker...
They came up with
No need to mention, that is rather bad news for production systems.
What can be done to avoid this or similar things? I assume the docker daemon cannot be run as a non-root user (or else that would likely be the default way to start it)? One solution that comes to mind is not putting unprivileged users in the
Do we know how hard it is to break out of the container and gain a host
EDIT: to clarify, I am specifically interested in this attack scenario:
EDIT2: My research shows only a single real breakout incident which was fixed with Docker Engine 0.12, and then again in 1.0.1. Also, all things I found that are critical about this mention the erstwhile TCP/IP interface of the docker daemon, which was changed to the Unix socket for a long time now. Is there anything else, or would you say that there is no problem as long as the person typ
They came up with
docker run -ti --privileged -v /:/host fedora chroot /host , which puts your unprivileged (but docker-grouped) user in a root shell on the host (not in the container).No need to mention, that is rather bad news for production systems.
What can be done to avoid this or similar things? I assume the docker daemon cannot be run as a non-root user (or else that would likely be the default way to start it)? One solution that comes to mind is not putting unprivileged users in the
dockergroup and only allowing specific docker command lines via sudoers. But is that enough?Do we know how hard it is to break out of the container and gain a host
root shell like in the example given, if the image is ran as usual (i.e. docker run --rm -it myimage)? I am not even talking about some attack via the service running in the container, but by, say, a specially prepared image.EDIT: to clarify, I am specifically interested in this attack scenario:
- Untrusted software developer delivers a ready-to-run image.
- Trusted user runs that image on a production system in a normal fashion (i.e.,
docker --rm -it myimage), without--privileged.
- Possibly with sub-scenarios of
-u unprivileged_uor not.
- Does the untrusted person that created the image have a reasonable chance to break out of the docker container runtime? I.e., more than with, say, classical
chrootor classical permission elevation?
EDIT2: My research shows only a single real breakout incident which was fixed with Docker Engine 0.12, and then again in 1.0.1. Also, all things I found that are critical about this mention the erstwhile TCP/IP interface of the docker daemon, which was changed to the Unix socket for a long time now. Is there anything else, or would you say that there is no problem as long as the person typ
Solution
Compared to just adding the user to the docker group the second links solution is not any more secure.
Note how that page still can launch with the
This means that the container can access all resources including the hosts disks and hardware. It is security through obscurity, which is not security. Anyone who knows how to use
The reality is that docker shifts both complexity and the security boundaries. All users who can launch container or hit the API need to be restricted to trusted users in that security context.
Namespaces are not a security function and depend on apparmor and selinux to enforce reasonable constraints.
Docker notes this on this page.
https://docs.docker.com/engine/security/security/#control-groups
'First of all, only trusted users should be allowed to control your Docker daemon.'
The security functions of docker/kube are not administrative boundaries like classical unix permissions, they are tools to prevent non-privileged containers from breaking out.
In a docker world the administrative boundary becomes the host, and the selection and segmentation required to isolate applications or users within that context needs to be applied at that boundary.
The benefits of this shift in complexity and responsibility generally outweigh the risks if implemented with those changes in mind.
TLDR
Docker API user == Sudo ALL user
Running a container with the --privlaged flag == running a web service as suid or as the root user.
Edited per the OP's request for additional information
The referenced issue with breakout int he OP's edit was an non uid0 privilege escalation.
Unfortunately, due to the need to perform root only actions Docker needs to enable some capabilities so that apt/dnf can install packages etc...
This need does pose a risk if production workloads are run in this default configuration and one should adopt the security principle of least privilege for production workloads.
I am using ubuntu but it may be useful to work through the following steps. First start a default container with docker run -i --rm -t debian bash
From the parent host find the PID for bash using ps and note that the process is owned root. If you look in
You will want to refer to man 7 man capabilities and /usr/include/linux/capability.h for better info but a summer is
CapInh = The Inherited capabilities (what docker provided)
CapPrm = The capabilities due to permissions (inside the container OS)
CapEff = The effective capabilities
You can decode these to human readable form by running:
Now do the same with
Also just do a
By looking at
Going back to the principle of least privileges, I would ensure that my docker container is running process as a unprivileged user and with as few enabled capabilities as possible.
As an example, you can test this by running it first by dropping all caps.
This can be validated through the
Note it is all zeros, from CapInh. Then if I had to enable features for the application I would use
Example,
So we can add the
But be very careful adding them as an example.
cap_sys_module will allow a container to add or remove kernel modules from the parent host.
cap_sys_rawio will open memory and all block devices to attack
cap_sys_admin is super dangerous.
So in this case I would see if you can make things run in this context.
$ docker run -i --rm -t --cap-drop=all -u nobody:nog
Note how that page still can launch with the
--privileged flag, and that it is running unconfined.unconfined_u:unconfined_r:unconfined_tThis means that the container can access all resources including the hosts disks and hardware. It is security through obscurity, which is not security. Anyone who knows how to use
mknod or walk the /sys and /proc trees could easily compromise the host and all containers with zero logging.The reality is that docker shifts both complexity and the security boundaries. All users who can launch container or hit the API need to be restricted to trusted users in that security context.
--privileged disables all apparmor and selinux policies which is actually far less secure than a native package.Namespaces are not a security function and depend on apparmor and selinux to enforce reasonable constraints.
Docker notes this on this page.
https://docs.docker.com/engine/security/security/#control-groups
'First of all, only trusted users should be allowed to control your Docker daemon.'
The security functions of docker/kube are not administrative boundaries like classical unix permissions, they are tools to prevent non-privileged containers from breaking out.
In a docker world the administrative boundary becomes the host, and the selection and segmentation required to isolate applications or users within that context needs to be applied at that boundary.
The benefits of this shift in complexity and responsibility generally outweigh the risks if implemented with those changes in mind.
TLDR
Docker API user == Sudo ALL user
Running a container with the --privlaged flag == running a web service as suid or as the root user.
Edited per the OP's request for additional information
The referenced issue with breakout int he OP's edit was an non uid0 privilege escalation.
Unfortunately, due to the need to perform root only actions Docker needs to enable some capabilities so that apt/dnf can install packages etc...
This need does pose a risk if production workloads are run in this default configuration and one should adopt the security principle of least privilege for production workloads.
--privileged disables apparmor/selinux and opens up capabilitiesI am using ubuntu but it may be useful to work through the following steps. First start a default container with docker run -i --rm -t debian bash
From the parent host find the PID for bash using ps and note that the process is owned root. If you look in
/proc/$PID/status you will see the contexts it is running under.# egrep '^Cap(Prm|Inh|Eff)' /proc/16026/status
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fbYou will want to refer to man 7 man capabilities and /usr/include/linux/capability.h for better info but a summer is
CapInh = The Inherited capabilities (what docker provided)
CapPrm = The capabilities due to permissions (inside the container OS)
CapEff = The effective capabilities
You can decode these to human readable form by running:
$ capsh --decode=00000000a80425fbNow do the same with
$ docker run -i --rm --privileged -t debian bash and you will find that the effective capabilities are 0000003fffffffffAlso just do a
dir /dev in both VM's and you will see just how much access a privileged container has.By looking at
/etc/apparmor.d/docker for apparmor systems or the lables in SElinux you will see the implications.Going back to the principle of least privileges, I would ensure that my docker container is running process as a unprivileged user and with as few enabled capabilities as possible.
As an example, you can test this by running it first by dropping all caps.
$ docker run -i --rm -t --cap-drop=all -t debian bashThis can be validated through the
/proc/$PID/status method above.CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000Note it is all zeros, from CapInh. Then if I had to enable features for the application I would use
--cap-add after the --cap-drop=allExample,
--cap-drop=all will break ping:# ping -c 1 www.google.com
ping: Lacking privilege for raw socket.So we can add the
cap_net_raw cap. Docker expects the arguments with the cap_ prefix removed so the command is docker run -i --rm -t --cap-drop=all --cap-add=net_raw -t debian bash# ping -c 1 www.google.com
PING www.google.com (216.58.216.164): 56 data bytes
64 bytes from 216.58.216.164: icmp_seq=0 ttl=54 time=4.779 msBut be very careful adding them as an example.
cap_sys_module will allow a container to add or remove kernel modules from the parent host.
cap_sys_rawio will open memory and all block devices to attack
cap_sys_admin is super dangerous.
So in this case I would see if you can make things run in this context.
$ docker run -i --rm -t --cap-drop=all -u nobody:nog
Code Snippets
# egrep '^Cap(Prm|Inh|Eff)' /proc/16026/status
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb$ capsh --decode=00000000a80425fb$ docker run -i --rm -t --cap-drop=all -t debian bashCapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000# ping -c 1 www.google.com
ping: Lacking privilege for raw socket.Context
StackExchange DevOps Q#1957, answer score: 7
Revisions (0)
No revisions yet.