vsupalov

Can You Mount a Volume While Building Your Docker Image to Cache Dependencies?

Nope, this is not possible. But there are three ways which can help you get the effects.

[ docker ]

Right now, you can’t mount volumes, as you can when working with containers. There is no -v option when issuing a docker build. It would be really handy - if you use a volume, you’re not impacting the size of the image, can mount data from outside of the container and reuse previously executed work as a kind-of-starting-point.

So, do you have to accept that your image is 1.5GB larger than it should be? Do you really have to watch your build process start from scratch every. single. time you are building your image from your Dockerfile? Even if it’s on the same CI server, the data is right there and oh no, it’s going to take another 10 minutes after the fix, doesn’t it?

The good news is, you can do better. Although you can’t have volumes, you can get the benefits with a bit of one-time effort, and at the cost of slightly higher complexity. With the techniques described below, you can:

  • Get messy operation out of your main Docker image, and only use the results,
  • Make the cache-dependent operation less frequent (given that your dependencies are changing less frequently than your complete codebase),
  • COPY data into your image, without relying on host-specific operations, thus providing a better place to start than from-scratch.

Let’s dive into details.

Reduce Final Image Size

If you’re just looking to reduce the final size of your image - take a look at multi-stage builds. Here is an example multi-stage Dockerfile:

FROM ubuntu as intermediate
RUN mkdir -p /data
RUN mkdir -p /result
ADD data.tar /data/data.tar
RUN #some expensive operation
# finally, /result ends up with the final data

FROM ubuntu
COPY --from=intermediate /result /result
# simply use the result

If you’re working with a tar ball of data, need to unpack it to process it, but are only interested in the result, you could use this technique to do the heavy lifting in an intermediate image, accept that it’s going to be huge due to all the diff layers, and only use the results by issuing a COPY --from command in the final part of your Dockerfile.

Only Install Dependencies If They Actually Change

You could make the dependency-installing step less frequent, by adding the dependency definition file before you add all your code. You can read more about this approach, applied to building an image for a Python application - you can read more on that here.

You can apply this, if you have something like a requirements.txt file, which describes third-party packages which need to be installed. If you install your dependencies after adding your codebase, you don’t make use of the caching which Docker provides. By adding your dependency definition file first, and installing them before you add your complete code, you’ll be able to skip this expensive step if your dependencies have not changed.

# From this:
ADD code /src/code
RUN pip install -r /src/code/requirements.txt #this will run every. single time.

# To this:
ADD code/requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt #only re-executed if the file changes
ADD code /src/code

While this makes use of caching, it needs to be re-executed from scratch every time you add or change a dependency. The following technique can reduce the time in general.

COPY A Close-Enough State From Another Docker Image

When building an image, you can’t mount a volume. However, you can copy data from another image! By combining this, with a multi-stage build, you can pre-compute an expensive operation once, and re-use the resulting state as a starting point for future iterations. As you’re using Docker-native functionality, there’s no need for messy data copying from and to the host system.

Here is how it would work with a multi-stage build:

FROM ubuntu as intermediate
RUN apt-get install -yqq python-dev python-virtualenv
RUN virtualenv /venv/
RUN mkdir -p /src
# those don't change often
ADD code/basic-requirements.txt /src/basic-requirements.txt
RUN /venv/bin/pip install -r /src/basic-requirements.txt

FROM ubuntu
RUN apt-get install -yqq python-dev python-virtualenv
# the data comes from the above container
COPY --from=intermediate /venv /venv
ADD code/requirements.txt /src/requirements.txt
# this command, starts from an almost-finished state every time
RUN /venv/bin/pip install -r /app/requirements.txt

Although we are using a multi-stage Dockerfile here, you could specify any image. A more advanced way to use this technique, would be to create a dependencies container, tag it with a version, and refer to it during the build of other containers to guarantee stability and reproducible behaviour.

Conclusion

Although there’s no functionality in Docker to have volumes at build-time, you can use multi-stage builds, benefit from Docker caching and save time by copying data from other images - be it multi-stage or tagged ones. I hope that these tricks will help you improve your Docker build experience and workflows!

Master Docker ARG and ENV in 5 Days

Sign up to the free 5-day email course, and learn all you need to know about using environment and build-time variables with Docker. Get actionable advice, best practices and nifty tricks to use for your next project.