Dockerfile HEALTHCHECK

Writing a Dockerfile is straightforward — slap a FROM, add some RUN commands, maybe throw in a CMD, and boom, you’re a Docker expert. But writing good Dockerfiles that follow best practices? That’s another level.

To keep things short and precise, one thing that’s kinda underrated and if often mentioned is the HEALTHCHECK instruction. At the time I was mostly aware of it however it didn’t caught my attention until Trivy started complaining about AVD-DS-0026: No HEALTHCHECK defined as a LOW severity finding in our GitHub Actions pipeline.

So here’s why you should stop ignoring HEALTHCHECK and how to actually implement it without breaking everything.

why we need them

Sure, avoiding trivy’s LOW severity findings is nice, but HEALTHCHECK does way more than keeping security scanners happy. Without it, Docker can only tell you if your container’s main process (PID 1) is running. But a running process doesn’t mean your app is actually working — it could be sitting there completely unresponsive, and Docker would still give it a thumbs up. Think of it this way: Technically it could be up and running, but definitely not service any requests.
Catch issues at an early stage - before they become incidents.
It becomes more valuable since it integrates well with container orchestration tools such as K8s, Docker Swarm and Docker Compose. Unhealthy containers get replaced automatically, traffic gets rerouted, and your 3 AM pager alerts become slightly less frequent.

gotchas

Beware of any false positives. Don’t set timeouts so aggressive that your health checks fail during normal startup. Nobody wants containers restarting every 30 seconds because your Node.js app takes 45 seconds to initialize. On the flip side, waiting 10 minutes to detect a dead container is also not OK.
That lightweight Alpine image you love? It might not have curl, wget, pg_isready or whatever tool your health check needs. Always verify your health check dependencies are actually available in the final prod image.
If you’re checking /health or /ping, make sure those endpoints actually exist and are accessible. Sounds obvious, but we’ve all been there.
Clean exit codes:
- exit 0 - healthy
- exit 1 - not healthy
- <anything_else> - you are doing it wrong

show me the code

FROM node:24-alpine

# Add curl for health checks (because Alpine doesn't include it)
RUN apk add --no-cache curl

EXPOSE 3001

HEALTHCHECK --interval=30s --timeout=10s --retries=3  --start-period=30s \
  CMD ["/bin/sh", "-c", "curl -f http://localhost:3001/health || exit 1"]

CMD ["npm", "run", "start:prod"]

--interval=30s: Check every 30 seconds
--timeout=10s: Wait up to 10 seconds for response
--start-period=30s: Grace period for container startup
--retries=3: Mark unhealthy after 3 consecutive failures
CMD ["/bin/sh", "-c", "curl -f http://localhost:3001/health || exit 1"]: The actual health check command

bottom line

If you don’t want your end-users to be your monitoring system, stop being lazy and start adding HEALTHCHECK instructions at least to your production images. Your future self (and your on-call rotation) will thank you.

Of course, there’s another way to know if your applications are actually working — and here’s the shameless plug: visit justanotheruptime.com and set up proper monitoring and alerts in 60 seconds. Because sometimes you need monitoring that monitors your monitoring.

why we need them#

gotchas#

show me the code#

bottom line#

why we need them

gotchas

show me the code

bottom line