What Are Docker Image Layers?
You often read that Docker images consist of layers. But what are those layers anyway?
- Each layer is an image itself, just one without a human-assigned tag. They have auto-generated IDs though.
- Each layer stores the changes compared to the image it’s based on.
- An image can consist of a single layer (that’s often the case when the squash command was used).
- Each instruction in a Dockerfile results in a layer. (Except for multi-stage builds, where usually only the layers in the final image are pushed, or when an image is squashed to a single layer).
- Layers are used to avoid transferring redundant information and skip build steps which have not changed (according to the Docker cache).
“Image” is both used for the stuff that
docker images outputs - an image with an image ID and maybe a tag attached. But it’s also used for tagged images. The kind you can pull from Docker Hub.
I’ll use “tagged image” for the second case for the rest of this article to reduce the potential for confusion a bit.
An “image” will just be the Docker building block, which can be tagged or not. It’s essence is easy to explain:
Words matter. That’s why I try to make an effort to distinguish containers from images or
env_files. Being careful with words can do wonders for your understanding of Docker.
An Image Is Basically A Diff
An image contains information on what changed to the image it’s based on. Each image has a parent (well, apart from those based on scratch) it refers to.
Here’s a closer look at how those images look like in an exportable format if you’re interested.
If you’re in a hurry: they contain changes to the metadata, but also changed or deleted files in the future-container-filesystem.
Tagged Images Consist Of One Or More Images
A tag points to a single Docker image. It can have an arbitrary amount of predecessors! An image can have an arbitrary amount of tags pointing to it!
Also, just because the image is tagged, does not mean there can’t be other images based on it.
In fact, almost all base images are tagged images when you think about it.
If you look at the output of a
docker images command (or
docker images | less -S if your display is on the small side - you can scroll around there), you’ll probably see that there are lots of named/tagged images. There are surely also ones without a human-readable name attached to them. All of the entries are images! However, some are not tagged. They lack a name/tag pair, and usually are there as less-visible building block of a tagged image.
Image == Layer
Well, at least in this article. With our definition and distinction from above, we can use the words image and layer almost interchangeably. They refer to the same building blocks, only that “layer” implies that the image is part of a tagged image.
If you have a tagged image pointing to a chain of 5 images built on each other, that tagged image has 5 layers, or 5 (or 4? depending on how you see it) intermediate images if you prefer that notion.
Each instruction in a Dockerfile results in a single new image layer/intermediate image/image being created.
If you provide your build command with a name:tag flag, the Docker image resulting from the final line of your Dockerfile will be tagged with it.
docker build -t fancyname:fancytag .
Each layer, is a complete image in itself. It has a image it’s based on (the previous layer), it introduces changes to the filesystem of the image and to the metadata.
You could run a container from any single image layer, you’d only need to look up its ID instead of using a human-friendly name:tag pair.
Sometimes, image layers are also called intermediate images, mostly because they exist in anonymity as part of another, named and tagged layer, which builds on top of them.
When you want to reduce a chain of images to a single image, you can use squashing. It takes all changes, and sums them up into a single image.
This was a handy approach to remove unnecessary temporary files, or to make sure no secrets made it into Docker images.
If you add something in one layer, and delete in the next, the add operation is still there in the first layer and can be looked up if you share the image.
Multi-stage builds are another way to use layers during a build, which are not shared with the final image. They are great for many use cases and are a step up from simply squashing your images, as you can still make use of layers for speed and convenience!
Layers are there, to save on computational effort when building images, and bandwidth when distributing (aka pulling and pushing) them.
They use a copy-on-write filesystem to save on disk space for images and future containers.
With layers, you can work with Docker images faster - both because your builds can skip unnecessary steps, and because your pushing and pulling can skip transferring large unchanged chunks of information which are already present at the destination.
I hope this article has helped you to get a better mental model of what Docker layers are.
Don’t worry, Docker becomes a lot less intimidating once you have navigated the confusing parts. This is one of them. It gets easier once you have figured them all out a bit.
If you want to learn more about Docker images, check out my free email course over here. Otherwise, stay tuned for a closer look at how the Docker cache and layers work together in the future!