How Does the Docker Cache Work?

What does the Docker build cache work, what is it good for and how to make the most of it?

Let’s find out! Before you continue reading, make sure you understand what Docker image layers are, as the Docker cache is built around that concept.

Advantages Of The Cache

We can fill the local Docker cache up from a remote source - for example when pulling image layers from a public or private repository.

With the help of the build cache, we can skip steps of the Docker build, reusing previous results.

It’s a convenient way to speed up the build process and transfer less data by reusing existing layers when possible.

If you’re squashing your images into a single layer after the build, you won’t be able to make use of the cache for distributing your images. It’s a new image every single time. Well, at least the base image would be reused if it hasn’t changed.

When Is A Layer Cached?

If you’re building an image, and are not using the --no-cache flag, the Docker build cache is taken into account for each layer.

Here’s what needs to be the case for a layer to be retrieved from the cache, instead of being built:

  • Its parent image exists in the cache
  • The Dockerfile instruction corresponding to the layer is unchanged (or in case of ADD/COPY, the involved files are exactly the same)

The main idea is - if all previous layers are unchanged, and the instruction to create this layer is unchanged, we can just reuse the result of the previous run.

This is a frequent gotcha by the way - the Docker cache only cares about the literal text of the line in the Dockerfile. Which brings us to…

Cache Gotcha #1

Sometimes, you’d expect a command to run again, but it doesn’t. That’s the case, because the Docker cache does not see a reason to do so.

A frequent offender is the following command:

RUN apt-get update

The text does not change between runs. As humans we know, that each run of apt-get update could be different from the last one. The Docker build cache algorithm however only sees a command which did not change since the last run.

This is a frequent stumbling block! If you don’t know how the cache works, you might expect it to be “smarter” and run such commands again.

Using the Cache Well

Another less obvious gotcha is relying on the cache too little

If you squash layers, or make it hard for Docker to reuse a cached version of a layer you won’t get the benefits of the Docker cache. A frequent example is adding a complete code repository to install third-party dependencies instead of just the requirements.txt or package.json and package-lock.json files.

See Tip 3 here.

If you only added the requirements.txt file first, and the installation step was running right afterwards, you would not need it so often.

If you plainly add a complete code repository, Docker will not be able to save you from downloading and installing packages every single time

Choosing to Skip the Cache

Usually, you can tell Docker to ignore the cache, and perform all steps. This can be a useful feature if you want to make sure everything was executed recently. The flag to use is --no-cache.

This will make sure, that all the lines in your Dockerfile are executed again, even if a valid cached version exists. No surprises if everything is guaranteed to run.

Beyond the Docker Build Cache

Sometimes, the build cache is not powerful enough. This is when topics like multi-stage builds and BuildKit cache mounts can come in handy.

They are both great advanced tools to speed up your Docker image builds! But a solid understanding of the Docker build cache is an important prerequisite before diving deeper into the topic of building good Docker images.