What's A Docker Image Anyway?
What’s a Docker image anyway? Let’s create a simple image, export it, and look take a look at what it’s made of.
Creating An Image
Let’s keep it really simple. I want to investigate the image of the following Dockerfile:
FROM ubuntu RUN echo hihi RUN touch /hi RUN rm /hi CMD ["echo", "hello"]
Completely without any practical purpose! We don’t even make the effort to use a properly tagged base image. Just :latest ubuntu.
Here’s why it will be useful: the first RUN line, does not make any changes to the file system. The second RUN line creates a new file, the third RUN removes it. Finally, we set a CMD for future containers.
All simple things which we’ll be able to investigate!
The only thing left is to build and tag a fresh image:
$ docker build -t investitest .
Dragging The Image Into The Light Of Day
We could run a container based on the image, but let’s skip that.
Instead, I want to
save the image we have just created. Here’s the command:
docker save --output investitest.tar investitest
This creates an
investitest.tar file in the directory where we ran the command. This is all the information contained in the image - in one place. Let’s see what’s inside!
Can’t get around looking up the tar command syntax. Here, I’ll save you the effort:
$ mkdir image $ tar xvf investitest.tar -C image
With the extra step of creating an new directory, so everything is reasonably tidy.
If we look inside, here’s what we’ll see:
Note: at least, that’s what I see in this case. If you build along, all your layers have different hashes than what I have here. That’s not an issue! It’s just how Docker images work.
0bd29a970da656512c70ecc4b6ab126177ac8bac18e0cae2ac49699956a0f2f4 2d89fdb622b01c291c5e492bc88217a250fcd653192eada26303f38189cacf08 4985c676b04225e91123eacff84a06ebdf4ed549020d7a92fe7206d890724717 88dc9157e2e1c4d9085bc8ed39b94c8f1d2072b74d22914d5097c24b6040df29 aa0f56dfa64cbbe3a136d7409c95c6cb53b226ea551b0cd1b88033ef5e728cb8 adf6d60526263f9f1c7805f902a184a176451cb069adf43781e01c10666aa46c.json b814aba80c68b70833d1881adc00d04764284169d1d2d9e6d5aebd2ab518aef3 manifest.json repositories
There are three files (manifest.json, repositories and the other .json file.
Here’s one missing peace of information. If we run
docker images and look for the image we just built,
here’s the info:
REPOSITORY TAG IMAGE ID CREATED SIZE investitest latest adf6d6052626 8 minutes ago 64.2MB
It has the image id
adf6..., let’s remember this one.
Making Sense Of Those Files
adf6(...).json file makes sense. It’s the metadata of the image! If you look inside, you’ll
see all kinds of useful entries. Data which is used to launch containers and tell people more about the
final image. Among others, it has a
Cmd entry, with the command we specified in the Dockerfile. Neat-o!
manifest.json file links to this metadata, but also to every single layer. It’s just a json file with a few entries, nothing magical. But it binds everything together.
My buest guess about
repositories, is that it tracks where this image came from. The information looks somewhat redundant to the entry in
manifest.json in this case. Well, it’s a simple example, and we won’t figure out everything in detail.
Let’s look into one of those layers - the mysterious directories we haven’t checked out yet.
What’s In A Layer?
First of all, which layer do we pick? The manifest.json file has a few entries in a list, but it’s not obvious to me which order they are in.
We can always look at the history of the image:
$ docker history investitest
Here’s the output:
IMAGE CREATED CREATED BY SIZE COMMENT adf6d6052626 18 minutes ago /bin/sh -c #(nop) CMD ["echo" "hello"] 0B 6b4be7b5ebcb 18 minutes ago /bin/sh -c rm /hi 0B 142805e1077e 18 minutes ago /bin/sh -c touch /hi 0B 7f6947553212 18 minutes ago /bin/sh -c echo hihi 0B cf0f3ca922e0 15 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B <missing> 15 months ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B <missing> 15 months ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B <missing> 15 months ago /bin/sh -c [ -z "$(apt-get indextargets)" ] 987kB <missing> 15 months ago /bin/sh -c #(nop) ADD file:d13b09e8b3cc98bf0… 63.2MB
So, it seems that tha
adf(...).json file is the last layer. And because it’s not introducing file system changes, it’s stored as a json file instead of a directory? I’m not sure about it but it would make sense. The other
4 layers with ids don’t overlap with the directories in the folder at all. Weird!
498... entry in the
manifest.json file comes last, so let’s check it out.
Inside the directory we can see three files:
VERSION json layer.tar
The json contains useful information. In there, is one entry among others:
Otherwise, the file is very similar to the
adf(...).json file we looked at for the last layer! Aha!
The image sha can also be found in the history command above. So that’s how they are connected. Neat!
What About The Tar?
That particular tar file, only contains one single file:
.wh.hi. It’s empty.
If we look at the Dockerfile line
RUN rm /hi which corresponds to this layer, it becomes
obvious that this is the way a file deletion is marked. Each image contains a diff to the previous one
Let’s check out the layer where a file is created, just to have a comparison. (it’s the
2d... directory, as we can see in the manifest.json file). The tar file there contains a single entry: “hi”, an empty file. Once again, a diff - probably a change would look similar. It’s a copy-on-write filesystem which makes Docker layers work after all.
So Much For Now!
That was a fun investigation! I naver looked into how a Docker image save file looks like. Of course, images are (probably? I assume there’s more efficient data structures at least) stored in a different fashion internally. But looking at this save format makes it way easier to understand what a Docker image is. A series of layers, containing diffs and changed metadata, building on top of each other.
As usually, there are more questions than I started with now, but new ones, which are more elaborate and detailed. Anyway, it was fun to investigate. I hope this article was interesting to you, and helped get a better feeling for what a Docker image is behind the scenes. It sure was interesting for me :)