patternMinor

Running the docker daemon without root

Submitted by: @import:stackexchange-devops·Mar 10, 2026·

Viewed 0 times

withoutthedockerrunningrootdaemon

Problem

Inspired by What are best and comprehensive practices to consider when running docker in production? , I stumbled over Why we don't let non-root users run Docker...

They came up with docker run -ti --privileged -v /:/host fedora chroot /host , which puts your unprivileged (but docker-grouped) user in a root shell on the host (not in the container).

No need to mention, that is rather bad news for production systems.

What can be done to avoid this or similar things? I assume the docker daemon cannot be run as a non-root user (or else that would likely be the default way to start it)? One solution that comes to mind is not putting unprivileged users in the dockergroup and only allowing specific docker command lines via sudoers. But is that enough?

Do we know how hard it is to break out of the container and gain a host root shell like in the example given, if the image is ran as usual (i.e. docker run --rm -it myimage)? I am not even talking about some attack via the service running in the container, but by, say, a specially prepared image.

EDIT: to clarify, I am specifically interested in this attack scenario:

Untrusted software developer delivers a ready-to-run image.

Trusted user runs that image on a production system in a normal fashion (i.e., docker --rm -it myimage), without --privileged.

Possibly with sub-scenarios of -u unprivileged_u or not.

Does the untrusted person that created the image have a reasonable chance to break out of the docker container runtime? I.e., more than with, say, classical chroot or classical permission elevation?

EDIT2: My research shows only a single real breakout incident which was fixed with Docker Engine 0.12, and then again in 1.0.1. Also, all things I found that are critical about this mention the erstwhile TCP/IP interface of the docker daemon, which was changed to the Unix socket for a long time now. Is there anything else, or would you say that there is no problem as long as the person typ

Solution

Compared to just adding the user to the docker group the second links solution is not any more secure.

Note how that page still can launch with the --privileged flag, and that it is running unconfined.

unconfined_u:unconfined_r:unconfined_t

This means that the container can access all resources including the hosts disks and hardware. It is security through obscurity, which is not security. Anyone who knows how to use mknod or walk the /sys and /proc trees could easily compromise the host and all containers with zero logging.

The reality is that docker shifts both complexity and the security boundaries. All users who can launch container or hit the API need to be restricted to trusted users in that security context.

--privileged disables all apparmor and selinux policies which is actually far less secure than a native package.

Namespaces are not a security function and depend on apparmor and selinux to enforce reasonable constraints.

Docker notes this on this page.

https://docs.docker.com/engine/security/security/#control-groups

'First of all, only trusted users should be allowed to control your Docker daemon.'

The security functions of docker/kube are not administrative boundaries like classical unix permissions, they are tools to prevent non-privileged containers from breaking out.

In a docker world the administrative boundary becomes the host, and the selection and segmentation required to isolate applications or users within that context needs to be applied at that boundary.

The benefits of this shift in complexity and responsibility generally outweigh the risks if implemented with those changes in mind.

TLDR

Docker API user == Sudo ALL user

Running a container with the --privlaged flag == running a web service as suid or as the root user.

Edited per the OP's request for additional information

The referenced issue with breakout int he OP's edit was an non uid0 privilege escalation.

Unfortunately, due to the need to perform root only actions Docker needs to enable some capabilities so that apt/dnf can install packages etc...

This need does pose a risk if production workloads are run in this default configuration and one should adopt the security principle of least privilege for production workloads.

--privileged disables apparmor/selinux and opens up capabilities

I am using ubuntu but it may be useful to work through the following steps. First start a default container with docker run -i --rm -t debian bash

From the parent host find the PID for bash using ps and note that the process is owned root. If you look in /proc/$PID/status you will see the contexts it is running under.

# egrep '^Cap(Prm|Inh|Eff)' /proc/16026/status
 CapInh:    00000000a80425fb
 CapPrm:    00000000a80425fb
 CapEff:    00000000a80425fb

You will want to refer to man 7 man capabilities and /usr/include/linux/capability.h for better info but a summer is

CapInh = The Inherited capabilities (what docker provided)
CapPrm = The capabilities due to permissions (inside the container OS)
CapEff = The effective capabilities

You can decode these to human readable form by running:

$ capsh --decode=00000000a80425fb

Now do the same with $ docker run -i --rm --privileged -t debian bash and you will find that the effective capabilities are 0000003fffffffff

Also just do a dir /dev in both VM's and you will see just how much access a privileged container has.

By looking at /etc/apparmor.d/docker for apparmor systems or the lables in SElinux you will see the implications.

Going back to the principle of least privileges, I would ensure that my docker container is running process as a unprivileged user and with as few enabled capabilities as possible.

As an example, you can test this by running it first by dropping all caps.

$ docker run -i --rm -t --cap-drop=all -t debian bash

This can be validated through the /proc/$PID/status method above.

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

Note it is all zeros, from CapInh. Then if I had to enable features for the application I would use --cap-add after the --cap-drop=all

Example, --cap-drop=all will break ping:

# ping -c 1 www.google.com
ping: Lacking privilege for raw socket.

So we can add the cap_net_raw cap. Docker expects the arguments with the cap_ prefix removed so the command is docker run -i --rm -t --cap-drop=all --cap-add=net_raw -t debian bash

# ping -c 1 www.google.com
PING www.google.com (216.58.216.164): 56 data bytes
64 bytes from 216.58.216.164: icmp_seq=0 ttl=54 time=4.779 ms

But be very careful adding them as an example.

cap_sys_module will allow a container to add or remove kernel modules from the parent host.

cap_sys_rawio will open memory and all block devices to attack

cap_sys_admin is super dangerous.

So in this case I would see if you can make things run in this context.

$ docker run -i --rm -t --cap-drop=all -u nobody:nog

Code Snippets

# egrep '^Cap(Prm|Inh|Eff)' /proc/16026/status
 CapInh:    00000000a80425fb
 CapPrm:    00000000a80425fb
 CapEff:    00000000a80425fb

$ capsh --decode=00000000a80425fb

$ docker run -i --rm -t --cap-drop=all -t debian bash

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

# ping -c 1 www.google.com
ping: Lacking privilege for raw socket.

Context

StackExchange DevOps Q#1957, answer score: 7

Revisions (0)

No revisions yet.