2 title: "Supplement: Creating Docker Images"
6 - "How do I create Docker images from scratch?"
7 - "What some best practices for Docker images?"
14 Common Workflow Language supports running tasks inside software
15 containers. Software container systems (such as Docker) create an
16 execution environment that is isolated from the host system, so that
17 software installed on the host system does not conflict with the
18 software installed inside the container.
20 Programs running inside a software container get a different (and
21 generally restricted) view of the system than processes running
22 outside the container. One of the most important and useful features
23 is that the containerized program has a different view of the file
24 system. A program running inside a container, searching for
25 libraries, modules, configuration files, data files, etc, only sees
26 the files defined inside the container.
28 This means that, usually, a given file _path_ refers to _different
29 actual files_ depending from the persective of being inside or outside
30 the container. It is also possible to have a file from the host
31 system appear at some location inside the container, meaning that the
32 _same file_ appears at _different paths_ depending from the persective
33 of being inside or outside the container.
35 The complexity of translating between the container and its host
36 environment is handled by the Common Workflow Language runner. As a
37 workflow author, you only need to worry about the environment _inside_
40 # What are Docker images?
42 The Docker image describes the starting conditions for the container.
43 Most importantly, this includes starting layout and contents of the
44 container's file system. This file system is typically a lightweight
45 POSIX environment, providing a standard set of POSIX utilities like a
46 `sh`, `ls`, `cat`, etc and organized into standard POSIX directories
47 like `/bin` and `/lib`.
49 The image is is made up of multiple "layers". Each layer modifies the
50 layer below it by adding, removing or modifying files to produce a new
51 layer. This allows lower layers to be re-used.
53 # Writing a Dockerfile
55 In this example, we will build a Docker image containing the
56 Burrows-Wheeler Aligner (BWA) by Heng Li. This is just for
57 demonstration, in practice you should prefer to use existing
58 containers from [BioContainers](https://biocontainers.pro/), which
61 Each line of the Docker file consists of a COMMAND in all caps,
62 following by the parameters of that command.
64 The first line of the file will specify the base image that we are
65 going to build from. As mentioned, images are divided up into
66 "layers", so this tells Docker what to use for the first layer.
73 This starts from the lightweight ("slim") Debian 10 Docker image.
75 Docker images have a special naming scheme.
77 A bare name like "debian" or "ubuntu" means it is an official Docker
78 image. It has an implied prefix of "library", so you may see the
79 image referred to as "library/debian". Official images are published
80 on [Docker Hub](https://hub.docker.com/search?type=image&image_filter=official).
82 A name with two parts separated by a slash is published on Docker Hub
83 by someone else. For example, `amazon/aws-cli` is published by
84 Amazon. These can also be found on [Docker Hub](https://hub.docker.com/search?type=image).
86 A name with three parts separated by slashes means it is published on
87 a different container register. For example,
88 `quay.io/biocontainers/subread` is published by `quay.io`.
90 Following image name, separated by a colon is the "tag". This is
91 typically the version of the image. If not provided, the default tag
92 is "latest". In this example, the tag is "10-slim" indicating Debian
97 > You should always include the tag to refer to a specific image
98 > version, or you might run into problems when "latest" changes.
100 The Docker file should also include a MAINTAINER (this is purely
101 metadata, it is stored in the image but not used for execution).
104 MAINTAINER Peter Amstutz <peter.amstutz@curii.com>
107 Next is the default user inside the image. By making choosing root,
108 we can change anything inside the image (but not outside).
110 The body of the Dockerfile is a series of `RUN` commands.
112 Each command is run with `/bin/sh` inside the Docker container.
114 Each `RUN` command creates a new layer.
116 The `RUN` command can span multiple lines by using a trailing
119 For the first command, we use `apt-get` to install some packages that
120 will be needed to compile `bwa`. The `build-essential` package
121 installs `gcc`, `make`, etc.
124 RUN apt-get update -qy && \
125 apt-get install -qy build-essential wget unzip
128 Now we do everything else: download the source code of bwa, unzip it,
129 make it, copy the resulting binary to `/usr/bin`, and clean up.
132 # Install BWA 07.7.17
133 RUN wget https://github.com/lh3/bwa/archive/v0.7.17.zip && \
142 Because each `RUN` command creates a new layer, having the build and
143 clean up in separate `RUN` commands would mean creating a layer that
144 includes the intermediate object files from the build. These would
145 then be carried around as part of the container image forever, despite
146 being useless. By doing the entire build and clean up in one `RUN`
147 command, only the final state of the file system, with the binary
148 copied to `/usr/bin`, is committed to a layer.
150 To build a Docker image from a Dockerfile, use `docker build`.
152 This command takes the name to use for the image with `-t`, and the
153 directory that it should find the `Dockerfile`:
156 docker build -t training/bwa .
161 > Create a `Dockerfile` based on this lesson and build it for yourself.
165 # Adding files to the image during the build
167 Using the `COPY` command, you can copy files from the source directory
168 (this is the directory your Dockerfile was located) into the image
169 during the build. For example, you have a `requirements.txt` next to
173 COPY requirements.txt /tmp/
174 RUN pip install --requirement /tmp/requirements.txt
177 # Best practices for Docker images
179 Docker has published guidelines on building efficient images:
181 https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
183 Some additional considerations when building images for use with Workflows:
185 ## Store Dockerfiles in git, alongside workflow definitions
187 Dockerfiles are scripts and should be managed with version control
188 just like other kinds of code.
190 ## Be specific about software versions
192 Instead of blindly installing the latest version of a package, or
193 checking out the `master` branch of a git repository and building from
194 that, be specific in your Dockerfile about what version of the
195 software you are installing. This will greatly aid the
196 reproducibility of your Docker image builds.
200 Use meaningful tags on the Docker image so you can tell versions of
201 your Docker image apart as it is updated over time. These can reflect
202 the version of the underlying software, or the version of the
203 Dockerfile itself. These can be manually assigned version numbers
204 (e.g. 1.0, 1.1, 1.2, 2.0), timestamps (e.g. YYYYMMDD like 20220126) or
205 the hash of a git commit.
207 ## Avoid putting reference data to Docker images
209 Bioinformatics tools often require large reference data sets to run.
210 These should be supplied externally (as workflow inputs) rather than
211 added to the container image. This makes it easy to update reference
212 data instead of having to rebuild and re-upload a new Docker image
213 every time, which is much more time consuming.
215 ## Small scripts can be inputs, too
217 If you have a small script, e.g. a self-contained Python script which
218 relies on modules installed inside the container, but is itself
219 contained in a single file, you can supply the script as a workflow
220 input. This makes it easy to update the script instead of having to
221 rebuild and re-upload a new Docker image every time, which is much
224 ## Don't use ENTRYPOINT
226 The `ENTRYPOINT` Dockerfile command modifies the command line that is executed
227 inside the container. This can result in confusion when the command
228 line that was supplied to the container and the command that actually