6 Docker Basics You Should Completely Grasp When Getting Started

Written: Jul 2017

You’re here because you want to use Docker for a small project. The idea is to test it out and see how it fits into your current tools. After doing a bit of googling to get an overview of the things people do with it, you’re probably a bit overwhelmed.

There’s lots of projects using it and all kinds of people talking about it. But why? What’s the advantage of using Docker? Will it make my life easier even if the project is small and not that hard to deploy at the moment?

Before trying to judge the usefulness, you should get to know the most important elements and tools around the Docker ecosystem when getting started. A firm understanding of those will make it easier for you to ask the right questions, and will help you navigate the sea of Docker-stuff without feeling lost.

Here are 6 Docker basics, explained in a brief and practical manner:

Containers

Imagine you’d like run a command isolated from everything else on the system. It should only access exactly the resources it is allowed to (storage, CPU, memory), and does not know there is anything else on the machine. The process running inside a container thinks it’s the only one and only sees a barebones Linux distro the stuff which is described in the image.

That sounds an awful lot like VMs, right? Yup. Only containers start faster and have less resource overhead. The best things about having a web app in containers, in my opinion, are:

You can get it to run on any Linux distro, given that Docker is installed (Ubuntu, Amazon Linux, …).
Deploying on a new server is really easy.
Multiple containerized apps on a single server don’t mess up each other.
If you update an app, you just build a new image, run fresh containers and don’t have to worry about other ones on the machine breaking.
Your apps will not break due to OS updates of Docker-unrelated packages. (If Docker itself is updated, this will affect the containers. Also, underlying mechanics, such as the system kernel and glibc updates have been known to cause trouble.

But why would you do that? A great way to think about the benefit of containers, is the way Docker was originally pitched and introduced - comparing it to the global shipments. Before standardized containers (the ones you see on huge ships and trucks) were agreed upon, it was quite challenging to pack and repack stuff along the journey, depending on which vehicle it needed to be transported with (packing boxes into a truck, just to have dozens of people unpack them from the truck and carry them on a ship). With standardized containers, you just lend or buy a container, pack it with your things (in a way which makes sure the container is not negatively affected, like things sliding around) and from then on people know how to handle the container and don’t need to adjust too much to what’s inside. Huge cranes loading and unloading containers on ships.

It’s the same with Docker containers containing apps. A machine running the container should not have to care about what’s inside too much, and the dockerized app does not care if it’s on a Kubernetes cluster or a single server - it will be able to run anyway (given that it’s well designed). (In the case of the Docker icon, the container is carried by a whale, and the container does not give a single care.)

An important point to note: a container can run more than a single process at a time. Some people choose to limit it to one though. You could package many services into a single container (Nginx, Gunicorn, supervisord, …) and have them all run side by side. Opinions vary whether that’s a Docker thing to do though. Each container should serve a distinct purpose (possible running multiple processes when it makes sense).

Images

An image, is a blueprint from which an arbitrary number of brand-new containers can be started. Images can’t change (well, you could point the same tag to different images, but let’s not go there), but you can start a container from an image, perform operations in it and save another image based on the latest state of the container. No “currently running commands” are saved in an image. When you start a container it’s a bit like booting up a machine after it was powered down.

It’s like a powered down computer (with software installed), which is ready to be executed with a single command. Only instead of starting the computer, you create a new one from scratch (container) which looks exactly like the one you chose (image).

Think of it as a very precise instruction what operating system to install, what files to put where, what packages to install, and what the resulting computer is supposed to execute (a single command) if not told otherwise. Because we are working with data, which is easy to copy, we usually don’t execute all instructions from scratch but just copy the end-state of those instructions (the image).

When starting a container from an image, you usually don’t rely on the defaults being right - you provide arguments to the command being executed, mount volumes (directories with data) with your own data and configurations and wire up the container to the network of the host in a way which suits you.

Dockerfiles

A Dockerfile is a set of precise instructions, stating how to create a new Docker image, setting defaults for containers being run based on it and a bit more. In the best case it’s going to create the same image for anybody running it at any point in time.

Consider the documentation a project should have. The sections of the README file, telling other people (or you-in-the-future) how to set up the environment, what to install regarding services or libraries, and how to run the project so it does something useful.

For example, one of your projects might depend on a particular language runtime like Python 3.5, a few system packages for handling images, pip dependencies and environment variables with credentials to access a cloud service.

Dockerfiles can be seen as the instructions to set up a project - but in executable code. A script which installs the operating system, all necessary parts and makes sure that everything else is in place too.

In a Dockerfile, you usually choose what image to take as the “starting point” for further operation (FROM), you can execute commands (starting containers from the image of the previous step, executing it, and saving the result as the most-recent image) (RUN) and copy local files into the new image (COPY). Usually, you also specify a default command to run (ENTRYPOINT) and the default arguments (CMD) when starting a container from this image.

Volumes

Images don’t change. You can create new ones, but that’s it. Containers on the other hand leave nothing behind by default. Any changes made to a container, given that you don’t save it as an image, are lost as soon as it is removed.

But having data persist is really useful. That’s where volumes come in. When starting a Docker container, you can specify that certain directories are mount points for either local directories (of the host machine), or for volumes. Data written to host-mounted directories is straightforward to understand (as you know where it is), volumes are for having persistent or shared data, but you don’t have to know anything about the host when using them. You can create a volume, Docker makes sure that it’s there and saved somewhere on the host system.

When a container exits, the volumes it was using stick around. So if you start a second container, telling it to use the same volumes, it will have all the data of the previous one. You can manage containers using Docker commands (to remove them for example). Docker compose makes dealing with volumes even easier.

Port Forwarding

By default, a container is not accessible by other containers, nor from the outside world. However, you can tell Docker to expose a container port to a port of the host machine (either the 127.0.0.1 interface or an external one).

A common scenario for private servers, is when you want to run a web app under a subdomain. An example would be a Flask application which listens on port 5000 of the container, and Docker is told to expose it on 127.0.0.1:9010 on the server. This means only local apps can access it.

At the same time, Nginx is installed on the host. It listens on the public port 80 for any incomming requests, and takes care of serving files or acting as a reverse proxy, based on the domain. So if a request for the domain of the app hits the server, Nginx makes sure that it is passed to 127.0.0.1:9010, and the container receives it on port 5000 and takes over from there. The containerized application does not really care about anything apart from its port 5000. It only gets the requests which are meant for it, which is a nice separation of concerns.

Docker Compose

I love this tool. In the beginning of Docker, you had to write lots of long-ish terminal commands. Especially if you wanted to have multiple containers talk to each other and do something useful.

So, starting an application container, mounted to a local directory and exposing a port would look something like:

$ docker run -d -p 127.0.0.1:9000:80 --name web -v /srv/app:/app vsupalov/app python app.py

And that’s a simple, nice toy example, without environment variables. Also the container is already built. Just imagine huge multiline commands, with lots of parameters. Urgh.

Then, fig came along and made things nicer. By now it got superseded by docker-compose, which does the same thing. Running a set of containers in the background is as simple as calling:

$ docker-compose up -d

And the configuration happens in a docker-compose.yml file, which looks like this:

version: '2'

volumes:
  # for persistence between restarts
  postgres_data: {}

services:
  db:
    #https://hub.docker.com/_/postgres/
    image: postgres:9.6.3
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: dbpw
    ports: #make db accessible locally
      - "127.0.0.1:5432:5432"

Way neater and nicer to work with.

Here is the complete example of using Docker with docker-compose for setting up a development environment (Redis and PostgreSQL) for working on a Flask app without cluttering your local system. It’s a great usecase for getting started with both.

Hope That Helped!

Docker can be a bit overwhelming at first, especially as it’s seeing so much usage and even more hype. People are a bit salesy at times, and navigating the whole environment can be challenging when starting out.

With the knowledge above, you should have a firm grasp of what Docker is about at the core, and to ask the right questions to be certain about what single concepts are about and why they are useful.

This is a good starting point to start following practical tutorials or diving deeper into more advanced topics. With these fundamentals you will have an easier time to understand all the new tools around it, and to decide for yourself if they could make your coding-life better.

Thanks for reading, I hope you got value from this writeup! If you liked this article - enter your email below and get notified about future posts.

A word from the author

Hi, I'm Vladislav. I work with small teams and bootstrapped founders who need to get their infrastructure right — reliable deployments, less operational risk, and systems that don't fall apart the moment the founder looks away. If that sounds like your situation, here's how we can work together.

I've been writing about Docker, deployment, and infrastructure since 2017. If you'd like to read more, the articles page is a good place to start — or you can sign up for the newsletter to get new pieces by email.