Writing a Dockerfile is straightforward — slap a FROM
, add some RUN
commands, maybe throw in a CMD
, and boom, you’re a Docker expert. But writing good Dockerfiles that follow best practices? That’s another level.
To keep things short and precise, one thing that’s kinda underrated and if often mentioned is the HEALTHCHECK
instruction. At the time I was mostly aware of it however it didn’t caught my attention until Trivy started complaining about AVD-DS-0026: No HEALTHCHECK defined as a LOW severity finding in our GitHub Actions pipeline.
So here’s why you should stop ignoring HEALTHCHECK
and how to actually implement it without breaking everything.
why we need them
-
Sure, avoiding trivy’s LOW severity findings is nice, but
HEALTHCHECK
does way more than keeping security scanners happy. Without it, Docker can only tell you if your container’s main process (PID 1) is running. But a running process doesn’t mean your app is actually working — it could be sitting there completely unresponsive, and Docker would still give it a thumbs up. Think of it this way: Technically it could be up and running, but definitely not service any requests. -
Catch issues at an early stage - before they become incidents.
-
It becomes more valuable since it integrates well with container orchestration tools such as K8s, Docker Swarm and Docker Compose. Unhealthy containers get replaced automatically, traffic gets rerouted, and your 3 AM pager alerts become slightly less frequent.
gotchas
- Beware of any false positives. Don’t set timeouts so aggressive that your health checks fail during normal startup. Nobody wants containers restarting every 30 seconds because your Node.js app takes 45 seconds to initialize. On the flip side, waiting 10 minutes to detect a dead container is also not OK.
- That lightweight Alpine image you love? It might not have
curl
,wget
,pg_isready
or whatever tool your health check needs. Always verify your health check dependencies are actually available in the final prod image. - If you’re checking
/health
or/ping
, make sure those endpoints actually exist and are accessible. Sounds obvious, but we’ve all been there. - Clean exit codes:
exit 0
- healthyexit 1
- not healthy<anything_else>
- you are doing it wrong
show me the code
FROM node:24-alpine
# Add curl for health checks (because Alpine doesn't include it)
RUN apk add --no-cache curl
EXPOSE 3001
HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=30s \
CMD ["/bin/sh", "-c", "curl -f http://localhost:3001/health || exit 1"]
CMD ["npm", "run", "start:prod"]
--interval=30s
: Check every 30 seconds--timeout=10s
: Wait up to 10 seconds for response--start-period=30s
: Grace period for container startup--retries=3
: Mark unhealthy after 3 consecutive failuresCMD ["/bin/sh", "-c", "curl -f http://localhost:3001/health || exit 1"]
: The actual health check command
bottom line
If you don’t want your end-users to be your monitoring system, stop being lazy and start adding HEALTHCHECK
instructions at least to your production images. Your future self (and your on-call rotation) will thank you.
Of course, there’s another way to know if your applications are actually working — and here’s the shameless plug: visit justanotheruptime.com and set up proper monitoring and alerts in 60 seconds. Because sometimes you need monitoring that monitors your monitoring.