vsupalov

What's A Docker Image Anyway?

What’s a Docker image anyway? Let’s create a simple image, export it, and look take a look at what it’s made of.

Creating An Image

Let’s keep it really simple. I want to investigate the image of the following Dockerfile:

FROM ubuntu

RUN echo hihi
RUN touch /hi
RUN rm /hi
CMD ["echo", "hello"]

Completely without any practical purpose! We don’t even make the effort to use a properly tagged base image. Just :latest ubuntu.

Here’s why it will be useful: the first RUN line, does not make any changes to the file system. The second RUN line creates a new file, the third RUN removes it. Finally, we set a CMD for future containers.

All simple things which we’ll be able to investigate!

The only thing left is to build and tag a fresh image:

$ docker build -t investitest .

Dragging The Image Into The Light Of Day

We could run a container based on the image, but let’s skip that.

Instead, I want to save the image we have just created. Here’s the command:

docker save --output investitest.tar investitest

This creates an investitest.tar file in the directory where we ran the command. This is all the information contained in the image - in one place. Let’s see what’s inside!

Unpacking

Can’t get around looking up the tar command syntax. Here, I’ll save you the effort:

$ mkdir image
$ tar xvf investitest.tar -C image

With the extra step of creating an new directory, so everything is reasonably tidy.

If we look inside, here’s what we’ll see:

Note: at least, that’s what I see in this case. If you build along, all your layers have different hashes than what I have here. That’s not an issue! It’s just how Docker images work.

0bd29a970da656512c70ecc4b6ab126177ac8bac18e0cae2ac49699956a0f2f4
2d89fdb622b01c291c5e492bc88217a250fcd653192eada26303f38189cacf08
4985c676b04225e91123eacff84a06ebdf4ed549020d7a92fe7206d890724717
88dc9157e2e1c4d9085bc8ed39b94c8f1d2072b74d22914d5097c24b6040df29
aa0f56dfa64cbbe3a136d7409c95c6cb53b226ea551b0cd1b88033ef5e728cb8
adf6d60526263f9f1c7805f902a184a176451cb069adf43781e01c10666aa46c.json
b814aba80c68b70833d1881adc00d04764284169d1d2d9e6d5aebd2ab518aef3
manifest.json
repositories

There are three files (manifest.json, repositories and the other .json file.

Here’s one missing peace of information. If we run docker images and look for the image we just built, here’s the info:

REPOSITORY                                  TAG                 IMAGE ID            CREATED             SIZE
investitest                                 latest              adf6d6052626        8 minutes ago       64.2MB

It has the image id adf6..., let’s remember this one.

Making Sense Of Those Files

Now the adf6(...).json file makes sense. It’s the metadata of the image! If you look inside, you’ll see all kinds of useful entries. Data which is used to launch containers and tell people more about the final image. Among others, it has a Cmd entry, with the command we specified in the Dockerfile. Neat-o!

The manifest.json file links to this metadata, but also to every single layer. It’s just a json file with a few entries, nothing magical. But it binds everything together.

My buest guess about repositories, is that it tracks where this image came from. The information looks somewhat redundant to the entry in manifest.json in this case. Well, it’s a simple example, and we won’t figure out everything in detail.

Let’s look into one of those layers - the mysterious directories we haven’t checked out yet.

What’s In A Layer?

First of all, which layer do we pick? The manifest.json file has a few entries in a list, but it’s not obvious to me which order they are in.

We can always look at the history of the image:

$ docker history investitest

Here’s the output:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
adf6d6052626        18 minutes ago      /bin/sh -c #(nop)  CMD ["echo" "hello"]         0B
6b4be7b5ebcb        18 minutes ago      /bin/sh -c rm /hi                               0B
142805e1077e        18 minutes ago      /bin/sh -c touch /hi                            0B
7f6947553212        18 minutes ago      /bin/sh -c echo hihi                            0B
cf0f3ca922e0        15 months ago       /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B
<missing>           15 months ago       /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B
<missing>           15 months ago       /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   745B
<missing>           15 months ago       /bin/sh -c [ -z "$(apt-get indextargets)" ]     987kB
<missing>           15 months ago       /bin/sh -c #(nop) ADD file:d13b09e8b3cc98bf0…   63.2MB

So, it seems that tha adf(...).json file is the last layer. And because it’s not introducing file system changes, it’s stored as a json file instead of a directory? I’m not sure about it but it would make sense. The other 4 layers with ids don’t overlap with the directories in the folder at all. Weird!

Well, the 498... entry in the manifest.json file comes last, so let’s check it out.

The Layer

Inside the directory we can see three files:

VERSION
json
layer.tar

The json contains useful information. In there, is one entry among others:

"Image":"sha256:6b4be7b5ebcb59706f2ba9596af4f79e33ef97f5534d5be987031c494a564f68"

Otherwise, the file is very similar to the adf(...).json file we looked at for the last layer! Aha!

The image sha can also be found in the history command above. So that’s how they are connected. Neat!

What About The Tar?

That particular tar file, only contains one single file: .wh.hi. It’s empty.

If we look at the Dockerfile line RUN rm /hi which corresponds to this layer, it becomes obvious that this is the way a file deletion is marked. Each image contains a diff to the previous one after all.

Let’s check out the layer where a file is created, just to have a comparison. (it’s the 2d... directory, as we can see in the manifest.json file). The tar file there contains a single entry: “hi”, an empty file. Once again, a diff - probably a change would look similar. It’s a copy-on-write filesystem which makes Docker layers work after all.

So Much For Now!

That was a fun investigation! I naver looked into how a Docker image save file looks like. Of course, images are (probably? I assume there’s more efficient data structures at least) stored in a different fashion internally. But looking at this save format makes it way easier to understand what a Docker image is. A series of layers, containing diffs and changed metadata, building on top of each other.

As usually, there are more questions than I started with now, but new ones, which are more elaborate and detailed. Anyway, it was fun to investigate. I hope this article was interesting to you, and helped get a better feeling for what a Docker image is behind the scenes. It sure was interesting for me :)