The Docker build cache can speed up your image builds and save a lot of bandwidth, or be something that eats away your time for no reason. This article is about how the Docker build cache works, what it is good for and how to make the most out of it.
If you are not quite sure about the difference between an image and a container, check out this overview of Docker basics. It will help to understand caching quicker. The same goes for this in-depth look at what Docker image layers are.
Ready? Let’s get started.
What The Cache Is
The Docker build cache is a mechanism, by which Docker stores image layers locally. This is good news - if you have pulled an image (and that image’s tag won’t be repurposed) you can reuse the data from the cache instead of downloading it anew.
There are two ways that image layers are put into the cache:
- When you pull an image.
- When you build an image.
Advantages Of The Cache
Before executing a step of an image build (which corresponds to a single instruction in a Dockerfile), Docker checks if the layer which was previously created can be reused from the cache.
It’s a convenient way to speed up the build process or transfer less data from a remote registry by reusing existing layers when possible. Or to skip expensive build steps, which would take a long time to compute.
A caveat: If you’re squashing your images into a single layer after the build, you won’t be able to make use of the cache for distributing your images. It’s a new image every single time. Multi-stage builds are a bit weird as well, but that’s another topic.
When Is A Layer Cached?
If you are not using the --no-cache
flag, the Docker build cache is taken into account for each layer of a pull or build.
The criteria for a layer being retrieved from the cache instead of fetched or rebuilt is the following:
- The parent layer is the same as last build and exists in the cache
- The Dockerfile instruction corresponding to the current layer is unchanged (or in case of ADD/COPY, the involved files are exactly the same)
If all previous layers are unchanged, and the instruction to create this layer is unchanged (and the files added or copied), the result of the previous run is reused from the cache.
This is a frequent gotcha by the way - the Docker cache only cares about the literal text of the line in the Dockerfile. You have probably seen it in the wild already:
Cache Gotcha #1
When starting out with building your own images, you’d expect a command to run again and produce different, but it doesn’t. Docker decides that there is no necessity for effort. The cache will do.
A frequent offender is the following command:
RUN apt-get update
Remember the criteria? If the preceding layers are the same as last build, and the command text is unchanged, the layer retrieved from the cache.
The text does not change between runs. The execution result would be different, but Docker does not consider that. As humans we know, that each run of apt-get update
could be different from the last one. The Docker build cache does not care.
This is a frequent stumbling block you can now avoid.
Using the Cache Well
To use the cache well, you have to consider it in your Dockerfiles.
Squashing layers will make it impossible for Docker to reuse cached layers when pushing an image. After all, there is just one brand-new squashed layer each time! You won’t get the benefits of the Docker cache this way.
Another way to stumble over the cache, is by fast-changing instructions in the beginning of your Dockerfile. For example COPY
ing your complete code repository to instead of just the requirements.txt
or package.json
& package-lock.json
files. The single files only change rarely. The complete repository will be different every single time. And the cache won’t be used.
If you want to lear more about speeding up Docker image builds and the order of instructions - have a look at tip 3 in this article.
Skipping the Cache by Choice
Sometimes, you want Docker to ignore the cache. This way, it will perform all steps of a Dockerfile. This can be a useful feature if you want to make sure everything that pesky apt-get update
line is executed!
The flag to use is --no-cache
. This will make sure, that all the lines in your Dockerfile are executed again, even if a valid cached version exists.
Going Beyond the Docker Build Cache
Knowing about the Docker build cache is super useful to get a better grip of your builds and to understand why a pull takes longer or shorter.
However, it is just one of many tools when it comes to building good Docker images. If you feel familiar with the cache, and want to dive deeper - topics like multi-stage builds and using advanced features like BuildKit cache mounts can be good next steps.
If you are interested in reducing your waiting times, check out this article on speeding up your Docker image builds instead!