You’ve got healthchecks setup (because you read those previous posts, right?), your containers are dutifully reporting their health status, and all is good. Until 2 AM when that flaky service decides to hang, your healthcheck correctly marks it as unhealthy, and… nothing happens. The container just sits there, unhealthy and useless, waiting for you to manually restart it.
That’s where autoheal
comes in — the unsung hero that actually does something when your containers get unhealthy.
what is autoheal
When I mention autoheal
I mean willfarrell/autoheal
— a tiny container that runs alongside your services and restart unhealthy containers - plain and simple.
the setup
Here’s something you could start with:
services:
autoheal:
image: willfarrell/autoheal:latest
container_name: autoheal
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
web:
image: my-app:latest
ports:
- "8080:3000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 20s
timeout: 5s
retries: 3
start_period: 10s
Now, if you want to be selective about which services should be “healed” and which not, simply stick autoheal=true
label to services you want to be autohealed. For instance we don’t want to autoheal any stateful service such as a db service, so take a look at the following:
services:
services:
autoheal:
image: willfarrell/autoheal:latest
container_name: autoheal
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- AUTOHEAL_CONTAINER_LABEL=autoheal
postgres:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 30s
timeout: 5s
retries: 5
start_period: 30s
web:
image: my-app:latest
depends_on:
postgres:
condition: service_healthy
labels:
- "autoheal=true"
#- autoheal: "true" # this should work too
#- autoheal: true # this won't work since docker compose might interpret it as a boolean value
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 20s
timeout: 10s
retries: 3
start_period: 10s
few tips and gotchas
- Do not set too aggressive healtchecks. Make sure your
start_period
in healthchecks gives apps enough time to actually start. - Do not set
AUTOHEAL_INTERVAL
too low. Checking every second creates unnecessary load. Every 10-30 seconds is usually fine. - As stated before I wouldn’t recommend it for stateful applications such as dbs, message queues, etc - as well as short-lived containers / apps.
- You need to mount Docker socket
/var/run/docker.sock
which givesautoheal
full Docker daemon access. That’s powerful and dangerous too. I guess you could run it in prod however I would strongly suggest a rootless Docker approach.
bottom line
Autoheal isn’t magic, but it’s the next logical step after implementing proper healthchecks. It turns your health monitoring from “notification system” to “self-healing system.”
Just remember: autoheal
fixes symptoms, not root causes. If your containers are constantly failing and restarting, you still need to figure out why. But while you’re debugging, at least your users aren’t getting 500 errors.
And speaking of monitoring root causes, justanotheruptime.com can help you track patterns in your service availability — because sometimes the real problem isn’t the container that’s failing, it’s the one that’s about to fail next.