I recently started using Docker images to hold specific toolchains in a controlled environment. With the end goal to set up a portable and reproducible build system. In a recent post, I described my base image with an ARM Cortex-M toolchain. After creating my ARM toolchain image, I started to expand to other toolchains and compilers. In this post, I will apply some best practices I came across to improve my images and reduce their size in the process.
After creating my ARM toolchain image, I started to add images for other compilers to experiment with. At some point, I noticed the size of each container and became interested in their composition. My examples for this post:
PS> docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
renemoll/builder_arm_gcc latest 033f62f87699 35 hours ago 971MB
renemoll/builder_clang latest c0597e09b2bf 29 minutes ago 950MB
Decomposing the image
Let’s look at what is inside the ARM toolchain image.
PS> docker history renemoll/builder_arm_gcc
117a48b37556 48 minutes ago /bin/sh -c #(nop) ADD dir:2c03d348c707eb8cc6… 26.4kB
5b3d59a4d317 48 minutes ago /bin/sh -c #(nop) WORKDIR /work 0B
2b66eb9161ab 48 minutes ago /bin/sh -c #(nop) ENV PATH=/opt/gcc-arm-non… 0B
2f1c8883c156 49 minutes ago /bin/sh -c wget -qO - https://developer.arm.… 505MB
52b90c935987 51 minutes ago /bin/sh -c #(nop) WORKDIR /opt 0B
e9b05e16c66c 51 minutes ago /bin/sh -c apt-get update && apt-get upgrad… 392MB
d5d96de639a4 2 hours ago /bin/sh -c #(nop) LABEL version=1.0 0B
8d60aeb11f6e 2 hours ago /bin/sh -c #(nop) LABEL description=Builder… 0B
e9ccb229a23d 3 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 3 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B
<missing> 3 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 811B
<missing> 3 weeks ago /bin/sh -c [ -z "$(apt-get indextargets)" ] 985kB
<missing> 3 weeks ago /bin/sh -c #(nop) ADD file:537f9883fb90d1938… 72MB
What we see here is a list of layers that together compose the image. Each layer represents a command in the Dockerfile. Next to the executed command, the size it adds to the image is listed. For example, the base image contributes 72MB, and adding a readme file adds 26kB.
Two layers stand out, the wget
and the apt-get update
operation. The first downloads and installs the ARM toolchain. As expected, this is a significant contribution to the total image size. The second operation, the update command, updates the local apt repository and installs the various tools I consider necessary.
Looking at the other image (builder_clang), the package installation step is the main contributor to the final image size, adding 877MB.
How can I reduce these sizes?
Applying best practices
There are several blog posts with tips and best practices on how to write and improve Dockerfiles. I found FROM:latest to be of great value as it lints your Dockerfile on the fly. Interactively parsing and analyzing your Dockerfile against common best practices.
Based on the linter’s feedback, I removed the upgrade step, added the “no recommended packages” option and properly removed the apt cache. The result:
PS> docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
renemoll/builder_arm_gcc latest a2ff316c15f9 9 minutes ago 905MB
renemoll/builder_clang latest 7da7f6a424f3 44 seconds ago 647MB
For the ARM image, this change does not do much. The Clang based machine sheds a little over a third of its size, definitely an improvement.
I believe this is due to the omitting of the recommended packages. For the ARM image, I manually pull and install the toolchain, so there is little to gain with this improvement. Whereas the Clang toolchain is pulled in via apt. This time without the recommended packages, reducing the total installation size. This is supported by going through the package’s contents. It shows that various development libraries will not be selected for installation, adding up to hundreds of megabytes.
Replacing packages
Next, I had a look at the packages I install for each image. For example, I install the build-essential package, but do I truly need this? Taking a look at the package details, it contains some development tools, gcc, g++ and make. For my builders, I only need the latter. So let us see what happens when I replace build-essential with make.
PS> docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
renemoll/builder_arm_gcc latest a73640419c6f 19 minutes ago 754MB
renemoll/builder_clang latest 1f2ae254e668 About a minute ago 524MB
After that I had a look at the installed packages, from inside the container. dpkg-query
provides a way to list installed packages, and with the following command it shows their installed size (in kilobytes) as well:
dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n | tail -n 10
This did not provide me with further improvement options. Any significant size contributors are required packages. None for which there are smaller alternatives available.
Results
I noted the total size of the images after each step in the following table:
Scenario | ARM GCC | Clang |
---|---|---|
Orignal | 971MB | 950MB |
Without apt upgrade and recommendated packages | 905MB | 647MB |
Replaced packages | 754MB | 524MB |
I am quite happy with these results. Sure, the ARM image did not reduce much sizewise. But knowing I stripped the images from unnecessary actions feels good. Furthermore, I have seen that toolchains pulled in from the OS’ repository can be reduced significantly. I shrank the size of my Clang image by 44%!
A popular choice to reduce image sizes further is to base the image on Alpine Linux as it is only 5MB. This is partly achieved by using a different C runtime, Alpine Linux ships with musl. Most software will function but may need to be recompiled. Not only is this time consuming, but results may also disappoint.
Personally I do not see a big advantage in changing OS. As it stands, it is only a small portion of the total size and the main contributors remain the required packages.