Docker Data Container Snapshots

It’s straightforward to create a data container with pre-baked archives included: COPY them on during docker build and off you go. But what if you can’t or don’t want to create the data ahead of time?

At RealScout, we recently started experimenting with the approach detailed below to create docker images from postgres snapshots for distribution to development environments. Briefly, it looks like:

Create and initialize a data container
Start postgres using the volume from that data container
Restore data
Stop postgres
Create an image from the populated data container

Setup

First, build a data container image, for example with this Dockerfile. :

FROM debian:jessie

ENV PGDATA /var/lib/postgresql/data

# ids must match postgres container
RUN groupadd -r -g 999 postgres && useradd -u 999 -r -g postgres postgres
RUN mkdir -p -m 0700 ${PGDATA} && chown -R postgres:postgres ${PGDATA}

VOLUME ${PGDATA}

CMD /bin/true

$ docker build .
Successfully built 221c07e4f85c

Now create a postgres data container using that image:

$ docker run --name postgres-data 221c07e4f85c

$ docker ps -af ancestor=221c07e4f85c
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                              PORTS               NAMES
d13320582ff2        221c07e4f85c        "/bin/sh -c /bin/true"   1 seconds ago       Exited (0) Less than a second ago                       hopeful_boyd

Now start a postgres server (using an image from Docker Hub here) using the volume from that data container:

$ docker run -d --volumes-from postgres-data --name postgres postgres
146ad1bcbb45c74100f5345facff8be2ba9a532cbe0c5642a0c0bf840b4234e4

Now we can load up some data. At this point, you should probably scrub out anything from the database backup that you don’t want to include in the image.

pg_restore ...

The clever part

We’re going to docker build inside of a third container that has the data volume mounted inside of it. It’s not quite docker in docker, but it’s related.

First, take the Dockerfile above and COPY data/ ${PGDATA} to it so it looks like:

FROM debian:jessie

ENV PGDATA /var/lib/postgresql/data

# ids must match postgres container
RUN groupadd -r -g 999 postgres && useradd -u 999 -r -g postgres postgres
RUN mkdir -p -m 0700 ${PGDATA} && chown -R postgres:postgres ${PGDATA}
COPY data/ ${PDATA}

VOLUME ${PGDATA}

CMD /bin/true

(We just have a single Dockerfile used for initial and snapshot builds and an empty data/ sitting around for use by the initial build.)

Now stick this in snapshot.sh:

#!/bin/bash -e
BUILD_WD=/tmp/build
DATA_CONTAINER=${1:-postgres-data}
DOCKERFILE=${2:-Dockerfile}

# like --volumes-from, but allow for mounting in a different location
DATA_VOLUME=$(docker inspect --format '{{range $mount := .Mounts}}{{if eq $mount.Destination "/var/lib/postgresql/data"}}{{$mount.Source}}{{end}}{{end}}' $DATA_CONTAINER)

# stop sends TERM, like postgres wants
docker stop -t 60 postgres

# create a docker container with data mounted at /data and run docker build inside it
docker run --rm \
       -v ${PWD}/${DOCKERFILE}:/tmp/build/Dockerfile \
       -v ${DATA_VOLUME}:${BUILD_WD}/data \
       -v /var/run/docker.sock:/var/run/docker.sock \
       -v $(which docker):/bin/docker \
       -w ${BUILD_WD} \
       debian:jessie \
       docker build .

if you used the --name flag on docker run earlier, you can snapshot with:

./snapshot.sh

I hope that helps. Please let me know if I screwed anything up.