You’ve got healthchecks setup (because you read those previous posts, right?), your containers are dutifully reporting their health status, and all is good. Until 2 AM when that flaky service decides to hang, your healthcheck correctly marks it as unhealthy, and… nothing happens. The container just sits there, unhealthy and useless, waiting for you to manually restart it.

That’s where autoheal comes in — the unsung hero that actually does something when your containers get unhealthy.

what is autoheal

When I mention autoheal I mean willfarrell/autoheal — a tiny container that runs alongside your services and restart unhealthy containers - plain and simple.

the setup

Here’s something you could start with:

services:
  autoheal:
    image: willfarrell/autoheal:latest
    container_name: autoheal
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

  web:
    image: my-app:latest
    ports:
      - "8080:3000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 20s
      timeout: 5s
      retries: 3
      start_period: 10s

Now, if you want to be selective about which services should be “healed” and which not, simply stick autoheal=true label to services you want to be autohealed. For instance we don’t want to autoheal any stateful service such as a db service, so take a look at the following:

services:
  services:
  autoheal:
    image: willfarrell/autoheal:latest
    container_name: autoheal
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - AUTOHEAL_CONTAINER_LABEL=autoheal

  postgres:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 30s
      timeout: 5s
      retries: 5
      start_period: 30s

  web:
    image: my-app:latest
    depends_on:
      postgres:
        condition: service_healthy
    labels:
      - "autoheal=true"
      #- autoheal: "true" # this should work too
      #- autoheal: true # this won't work since docker compose might interpret it as a boolean value
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 20s
      timeout: 10s
      retries: 3
      start_period: 10s

few tips and gotchas

  • Do not set too aggressive healtchecks. Make sure your start_period in healthchecks gives apps enough time to actually start.
  • Do not set AUTOHEAL_INTERVAL too low. Checking every second creates unnecessary load. Every 10-30 seconds is usually fine.
  • As stated before I wouldn’t recommend it for stateful applications such as dbs, message queues, etc - as well as short-lived containers / apps.
  • You need to mount Docker socket /var/run/docker.sock which gives autoheal full Docker daemon access. That’s powerful and dangerous too. I guess you could run it in prod however I would strongly suggest a rootless Docker approach.

bottom line

Autoheal isn’t magic, but it’s the next logical step after implementing proper healthchecks. It turns your health monitoring from “notification system” to “self-healing system.”

Just remember: autoheal fixes symptoms, not root causes. If your containers are constantly failing and restarting, you still need to figure out why. But while you’re debugging, at least your users aren’t getting 500 errors.

And speaking of monitoring root causes, justanotheruptime.com can help you track patterns in your service availability — because sometimes the real problem isn’t the container that’s failing, it’s the one that’s about to fail next.