Debugging a Broken Docker Compose Service Which Crashes Right Away
If you’re working with any more-complex docker-compose file, and are building your own images and specifying your own entrypoint scripts, you know that lots of stuff can go wrong. Here’s my favourite: One of your docker-compose services keeps crashing, with an obscure error message which isn’t enough to fix the issue. Unfortunately, to fix it you need the other containers in the stack to be up.
When you’re working with a single Docker image, you can build it, and run a container from it in interactive mode, specifying an override for the entrypoint and command parameters, and voila - you can tinker around and find the issue in no time. But what about docker-compose? Do you need to start every single service by hand? Is there some way to start the stack, and then run this one container interactively so you can debug it?
Obviously that’s too much effort. Here’s what usually happens. You try fixing it relying on luck, and always go through the same dance:
$ docker-compose down -v
# Waiting a few seconds to type the next command...
[...]
$ docker-compose up
# Here we go again...
[...]
app exited with code 1
[...] # YARGH!
The Solution
Here’s how you can run your ‘broken’ container, and investigate. Without doing acrobatics and feeling like you should just stop using hard-to-debug-Docker setups altogether.
Replace the current entrypoint, with one which will just run, but won’t do anything besides. Also, make sure to overwrite the command with something neutral. So, the service in the docker-compose.yml file in question gets those two lines:
services:
SERVICE_NAME:
# for debugging
#entrypoint: ["sh", "-c", "sleep infinity"]
entrypoint: ["sh", "-c", "sleep 2073600"]
Once this is in place, take your docker-compose stack down and up again. Now, the container in question should be up, not producing any output and most importantly it’s not crashing. You can jump into it, and take a thorough look around with.
$ docker-compose exec SERVICE_NAME bash
Replace SERVICE_NAME with the actual name of the service. Once in the container, you can try running your entrypoint command, apply it in chunks and iterate way faster to find out what’s broken and how to fix it. When you’ve found a way around the breaking issue, just fix it in your scripts permanently, adjust your entrypoint and command back to normal. Problem fixed without guessing and waiting around.