=================
== CJ Virtucio ==
=================

Container Gotchas

docker openshift

Containers are great. You, for the most part, get to own your app’s config (in terms of dependencies and platform), while ops only needs to worry about making sure you have a VM (for docker) or a cluster (for k8s) ready. However, there are a few gotchas that you’ll probably run into in getting them to run. Note that this could vary depending on your infrastructure and platform.

Don’t run as root

Configs

Initial configuration as root is fine:

FROM centos:7

ENV APP_DIR

RUN set -e; \
    yum-config-manager --add-repo <some repo url>; \
    yum install <deps>; \
    mkdir --parents "${APP_DIR}"; \
    <more preparation>; \
    : ;

but in general, don’t end this with your ENTRYPOINT. You don’t want this running as root. For one, this goes against the principle of least privilege. For another, if you’re running this on OpenShift, root isn’t even supported at all. Per the guidelines:

By default, OpenShift Container Platform runs containers using an arbitrarily assigned user ID. This provides additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node.

So just run as a non-root user:

USER 1001

ENTRYPOINT ['/docker-entrypoint.sh']

Sometimes you may need further configuration at runtime. Maybe you need to template some configs into a folder that your user doesn’t own. Or, in the case of OpenShift, you have zero control over who the container runs as. The workaround in this case (also recommended by the guidelines) is as follows:

  1. Ensure that your runtime user is part of GID 0 (the gid for a group with special privileges) (this is usually handled by whoever owns your VM or cluster).

  2. Set the group permissions on the folders you need to configure at runtime:

    RUN set -e; \
        # change group of APP_DIR to GID 0
        chgrp -R 0 "${APP_DIR}"; \
        # ensure that any file or directory that the user can read or write but not already readable or writeable by gid 0 can also be read and written to by gid 0
        find . -type f -perm /u=r,u=w -not -perm /g=r,g=w -exec chmod g=u {} \; \
        : ;
    

This way, as long as the runtime user is part of GID 0, it’ll be able to template configs into APP_DIR:

#!/usr/bin/env bash

set -e
template_volume_secrets_into_config_dir "${SECRETS_DIR}" "${APP_DIR}"
start_app

Ports

TCP/IP ports below 1024 are privileged. This means you can’t run on ports 80 and 443 (TLS port) as non-root. Your app should listen on a non-privileged port (usually 8080 for http, and something like 8443 for TLS):

# /path/to/Dockerfile
EXPOSE 8080
EXPOSE 8443

ENTRYPOINT ['/docker-entrypoint.sh']
#!/usr/bin/env bash

start_server 8080 8443

Likewise, your container should run on a non-privileged port:

docker run --detach --publish 0.0.0.0:8080:8080 --publish 0.0.0.0:8443:8443 docker-registry.foo.bar/some-namespace/my-app:latest

Normally you’d have a reverse proxy that can own the privileged ports, which would forward traffic to your containers. In k8s, you’d have an service resource that maps to the pods running the containers for your app (allowing other pods to talk to it without knowing the IP addresses of the nodes they’re running on), and an ingress resource for intelligently forwarding external traffic to the right services. The ingress is basically the equivalent of the reverse proxy you’d have to run for forwarding traffic to docker containers. This typically exposes the privilged ports 80 and 443, directing traffic through the many networking layers in the cluster until it reaches your containers at the non-privileged ports.

Publish ports with the correct IP address.

I recently spun up a pair of containers on a VM running docker: one httpd reverse proxy, and one prometheus instance:

$ docker ps | grep -E 'prometheus|httpd'
<hash>    docker-registry.foo.bar/my-namespace/my-prometheus:latest    /docker-entrypoint.sh    5 minutes ago    Up 5 minutes    127.0.0.1:8443:8443    my-promtheus-<hash>
<hash>    docker-registry.foo.bar/my-namespace/httpd-rp:latest    /docker-entrypoint.sh    5 minutes ago    Up 5 minutes    127.0.0.1:8443:8443    httpd-rp-<hash>

Made a few healthchecks with curl https://127.0.0.1/path/to/healthcheck from within the VM and got my expected outputs. I thought I was done.

Then I tried doing the same externally:

$ curl https://my-vm.corporate.org:8443/path/to/healthcheck
Connection refused

I spent a few minutes scratching my head on this; my ports were exposed, so why can’t I reach it?

The problem was actually trivial. As the doc briefly states:

--publish, -p Publish a container’s port(s) to the host

The thing is, a host can have several IP addresses. One is the IP within the subnet it’s on. The other is the local IP, 127.0.0.1.

On our corporate network, the VM had an IP address of something like 10.8.x.x.x that DNS maps the subdomain my-vm.corporate.org to. But we neither care nor want to know what that IP is. Moreover, we still want to be able to access the containers locally from within the VM.

This is where the all interface (0.0.0.0) comes in:

0.0.0.0, in this context, means “all IP addresses on the local machine” (in fact probably, “all IPv4 addresses on the local machine”). So, if your webserver machine has two IP addresses, 192.168.1.1 and 10.1.2.1, and you allow a webserver daemon like apache to listen on 0.0.0.0, it will be reachable at both of those IP addresses. But only to what can contact those IP addresses and the web port(s).

So we just needed to re-deploy the containers with --publish 0.0.0.0:8443 instead of --publish 127.0.0.1:8443:

$ docker ps | grep -E 'prometheus|httpd'
<hash>    docker-registry.foo.bar/my-namespace/my-prometheus:latest    /docker-entrypoint.sh    5 minutes ago    Up 5 minutes    0.0.0.0:8443:8443    my-promtheus-<hash>
<hash>    docker-registry.foo.bar/my-namespace/httpd-rp:latest    /docker-entrypoint.sh    5 minutes ago    Up 5 minutes    0.0.0.0:8443:8443    httpd-rp-<hash>

Once we did that, we were able to reach the healthcheck endpoints from outside the VM.