chore: add docker healthcheck for worker and server #332

xBlaz3kx · 2025-11-15T08:49:49Z

Added built-in healthcheck for server in Dockerfile
Added worker healtcheck in compose

Addressing #269

Summary by CodeRabbit

New Features
- Added two HTTP endpoints: a liveliness probe (simple 200 OK) and a healthcheck that verifies database and cache, returns aggregated JSON status and responds 200 OK or 503.
Chores
- Added container-level HEALTHCHECK to probe the app endpoint.
- Added healthcheck configurations for web and worker services in compose to surface service failures.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-15T08:50:02Z

Walkthrough

Adds two HTTP health endpoints (/healthcheck, /liveliness), routes them, implements container and docker-compose healthchecks, and adds a Dockerfile HEALTHCHECK; also removes a trailing space in an apt-get line continuation.

Changes

Cohort / File(s)	Summary
Health controller & routes `app/controllers/health_controller.rb`, `config/routes.rb`	Add `HealthController` with `healthcheck` (runs DB `SELECT 1` and Redis `PING`, returns JSON and 200 if all pass else 503) and `liveliness` (returns 200, no body; skips auth/CSRF and silences logging). Add routes: `GET /healthcheck` → `health#healthcheck`, `GET /liveliness` → `health#liveliness`.
Dockerfile changes `Dockerfile`	Remove trailing space before apt-get line continuation; add `HEALTHCHECK` in final image stage that curls `http://localhost:3000/healthcheck` and fails on non-200/unreachable.
Compose example healthchecks `compose.example.yml`	Add `healthcheck` blocks for `web` (HTTP test against `/healthcheck` with start_period/interval/timeout/retries) and `worker` (command-based healthcheck with interval/timeout/retries/start_period).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Docker as Docker/Compose
    participant Container as App Container
    participant Rails as Rails App
    participant DB as Database
    participant Redis as Redis

    Note over Docker,Container `#DDEBF7`: Container health probe (Dockerfile / compose)
    Docker->>Container: HTTP GET /healthcheck
    alt 200 OK
        Container->>Docker: 200 OK
    else non-200 / no response
        Container->>Docker: failure
    end

    Note over Container,Rails `#F6F8E9`: Rails healthcheck handling
    Container->>Rails: GET /healthcheck
    Rails->>DB: SELECT 1
    DB-->>Rails: OK / error
    Rails->>Redis: PING
    Redis-->>Rails: PONG / error
    Rails-->>Container: JSON {db:, redis:, ok:}, status 200 or 503

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing extra attention:
- DB/Redis error handling and logging in app/controllers/health_controller.rb
- Correctness and portability of HEALTHCHECK syntax in Dockerfile
- Compose healthcheck commands assuming required tooling is present in images

Possibly related PRs

Update Dockerfile - procps #286 — also modifies Docker image healthchecks and installs tooling for process-based checks.

Suggested labels

infra/tooling

Suggested reviewers

jjmata

Poem

🐰 I hop and probe each container's heart,
Curl, ping, and routes all play their part.
DB replies, Redis gives a cheer,
Liveliness whispers: “I am here.”
🥕 Healthchecks set — the rabbit's task is clear.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding Docker healthchecks for both the worker and server components across multiple configuration files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c8ba64 and 49fbcaf.

📒 Files selected for processing (2)

Dockerfile (2 hunks)
compose.example.yml (2 hunks)

🔇 Additional comments (2)

Dockerfile (1)

61-62: HEALTHCHECK configuration is well-configured.

The healthcheck correctly probes the application on localhost:3000 using curl with the --fail flag, and the timing parameters (30s interval, 5s timeout, 10s start-period, 3 retries) are sensible defaults. The curl binary is available in the final stage since it's installed in the base image (line 12).

compose.example.yml (1)

83-88: Worker service healthcheck is well-configured.

The process-based health check using pgrep -f sidekiq is a reasonable approach to monitor the Sidekiq worker, and the timing parameters (30s interval, 5s timeout, 3 retries, 10s start-period) are appropriate for this type of check.

compose.example.yml

jjmata · 2025-11-15T09:11:04Z

Thanks for sending this in, @xBlaz3kx! Love all the small "quality of life" improvements like this ...

My main concern with this is simple, but I think you'll appreciate given your line of work: logs become a mess when you do HEALTHCHECK against HTTP, because you have a snippet like this every so often that pollutes them, from my Grafana:

So you can see that I at least moved kube-probe to hit /up instead of / (which does a redirect and a lot more work from what I remember rendering /sessions/new unnecessarily ... but most recently I've asked that we move the HEALTHCHECK in this k8s deployment to a TCP check instead.

Want to rewrite it like that? Not sure what a good "compose friendly" way of doing it would be. 🤷‍♂️

jjmata

Lets decide if we go all the way to an HTTP GET health check or we can do something more nimble.

jjmata · 2025-11-15T09:13:58Z

Dockerfile

 # Entrypoint prepares the database.
 ENTRYPOINT ["/rails/bin/docker-entrypoint"]

+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \


Rather than curl a full HTTP request can we do a TCP check to make sure the container is up? Always hard to decide how far to decide "healthy" app ... roundtrip to the DB included?

Usually, you have a liveliness check and a healthcheck endpoint. Liveliness determines if the application is up and running. Health check endpoint would check if its running as expected (connected to db, message queues, etc).

I'm not faimiliar with Ruby, so I couldnt dig up the appropriate endpoint to use. Perhaps OpenAPI spec would be a nice addition to the docs.

For compose, we can just use the liveliness endpoint, since it will most likely not boot if migrations arent applied correctly. For Kubernetes, migrations should probably be ran before the main application startup (in a separate, sidecar container).

xBlaz3kx · 2025-11-15T09:44:08Z

I agree, polluting logs with healtchecks is a nightmare, specially when hitting the / endpoint. I'm not exactly familiar with all the available endpoints - is /up something like a /healthz endpoint?

There are other ways to go around this, for example if you are using Promtail, you could always filter out logs that are hitting the healtcheck endpoint.

Ill try to find a better solution for the healthcheck, maybe I can add dedicated health endpoints, and we can skip logs on those.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

Dockerfile (1)
61-62: Consider using /liveliness endpoint to reduce log pollution.

Based on the PR discussion, HTTP healthcheck log pollution is a concern. The /liveliness endpoint would be more appropriate for the Docker HEALTHCHECK because:

It silences logs (via silence_logger in the controller), addressing the log pollution concern raised by the maintainer

It's a lightweight check that only verifies the app is running (appropriate for container-level liveness)

The /healthcheck endpoint performs deeper checks (database, Redis) that may be unnecessarily heavy for frequent container health probes

Apply this diff to use the liveliness endpoint:
-HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
-  CMD curl --fail http://localhost:3000/healthcheck || exit 1
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+  CMD curl --fail http://localhost:3000/liveliness || exit 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49fbcaf and 226dda2.

📒 Files selected for processing (4)

Dockerfile (2 hunks)
app/controllers/health_controller.rb (1 hunks)
compose.example.yml (2 hunks)
config/routes.rb (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

compose.example.yml

🔇 Additional comments (6)

app/controllers/health_controller.rb (5)

1-9: LGTM! Well-designed health endpoint setup.

The controller appropriately:

Skips authentication and CSRF for health endpoints

Applies silence_logger only to /liveliness to reduce log noise from frequent container probes

Keeps logging enabled for /healthcheck so that actual health failures are visible

This design aligns with the PR discussion about minimizing log pollution while maintaining observability for real issues.

11-21: LGTM! Proper healthcheck implementation.

The healthcheck action correctly:

Aggregates multiple dependency checks (database and Redis)

Returns appropriate HTTP status codes (200 OK vs 503 Service Unavailable)

Provides detailed JSON response for debugging

23-25: LGTM! Lightweight liveliness probe.

The liveliness endpoint appropriately returns a simple 200 OK with no body, suitable for frequent container health checks.

29-35: LGTM! Database check is correct.

The database check performs a lightweight SELECT 1 query and properly handles errors.

48-54: LGTM! Proper logger silencing implementation.

The silence_logger helper correctly preserves the original log level and restores it in an ensure block.

config/routes.rb (1)

336-338: LGTM! Health routes properly configured.

The new routes are correctly placed alongside the existing Rails health check route and properly map to the HealthController actions.

app/controllers/health_controller.rb

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

app/controllers/health_controller.rb (3)
23-25: Simple and appropriate liveness probe.

The minimal implementation is ideal for a liveness endpoint that will be polled frequently by monitoring tools.

Optional: Consider naming convention. Kubernetes uses "liveness" rather than "liveliness" for probe endpoints. While "liveliness" is valid, "liveness" is more conventional in the container orchestration ecosystem.

37-43: Past review concern addressed correctly.

The implementation now uses Sidekiq.redis(&:ping) which correctly leverages the application's configured Redis connection pool, addressing the previous review feedback.

Consider broadening exception handling for consistency. The check_database method catches StandardError, but this method only catches Redis::CannotConnectError. Other Redis errors (timeouts, connection errors, etc.) might not be caught. For consistency and defensive coding, consider catching StandardError here as well.

Apply this diff for consistency with check_database:
   def check_redis
     Sidekiq.redis(&:ping)
     true
-  rescue Redis::CannotConnectError => e
+  rescue StandardError => e
     Rails.logger.error("Redis health check failed: #{e.message}")
     false
   end
Based on learnings

45-51: Correct implementation with proper cleanup.

The manual logger level manipulation with ensure block correctly silences logs and guarantees restoration of the original level.

Optional: Rails has built-in log silencing. While your implementation works well, Rails provides LoggerSilence concern with a silence method. However, your explicit implementation is clear and correct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 226dda2 and ae41499.

📒 Files selected for processing (4)

Dockerfile (2 hunks)
app/controllers/health_controller.rb (1 hunks)
compose.example.yml (2 hunks)
config/routes.rb (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

compose.example.yml
Dockerfile

🧰 Additional context used

🪛 GitHub Check: ci / lint

app/controllers/health_controller.rb

[failure] 53-53:
Layout/TrailingEmptyLines: 1 trailing blank lines detected.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci / test

🔇 Additional comments (5)

app/controllers/health_controller.rb (4)

1-6: LGTM! Appropriate callback skips for health endpoints.

Skipping authentication and setup callbacks is correct for health check endpoints, ensuring they remain fast and accessible for monitoring tools.

8-9: Good solution to the log pollution concern.

Silencing logs only for the liveliness endpoint addresses the issue raised in the PR discussion about healthcheck-induced log clutter while keeping detailed logging for the healthcheck endpoint.

11-21: LGTM! Standard healthcheck pattern with proper status codes.

The implementation correctly aggregates individual health checks and returns appropriate HTTP status codes (200 for healthy, 503 for unhealthy) along with detailed JSON for debugging.

29-35: LGTM! Efficient database health check.

Using SELECT 1 is an efficient and standard approach for database health checks. The broad exception handling with error logging is appropriate.

config/routes.rb (1)

336-338: LGTM! Well-placed and correctly configured routes.

The new health check routes are properly defined and logically placed near the existing Rails health check endpoint, making them easy to find and maintain.

app/controllers/health_controller.rb

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

app/controllers/health_controller.rb (1)
37-43: Consider broader error handling for consistency.

The check_redis method only rescues Redis::CannotConnectError, while check_database rescues the broader StandardError. Redis operations can raise other exceptions such as Redis::TimeoutError, Redis::ConnectionError, or Redis::CommandError. If these occur, they'll propagate up and cause a 500 error instead of being handled gracefully.

For consistency and resilience, consider rescuing StandardError like check_database does:
   def check_redis
     Sidekiq.redis(&:ping)
     true
-  rescue Redis::CannotConnectError => e
+  rescue StandardError => e
     Rails.logger.error("Redis health check failed: #{e.message}")
     false
   end

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae41499 and f477ea1.

📒 Files selected for processing (1)

app/controllers/health_controller.rb (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci / test

🔇 Additional comments (3)

app/controllers/health_controller.rb (3)
2-6: LGTM! Appropriate exemptions for health endpoints.

Skipping authentication, CSRF protection, and request tracking callbacks is correct for health check endpoints that need to be accessible by container orchestration tools without credentials.

8-21: Verify logging strategy for the healthcheck endpoint.

The silence_logger wrapper is only applied to the liveliness endpoint, but the healthcheck endpoint (which returns detailed JSON status) will still generate log entries on every Docker healthcheck poll. Based on the PR discussion where log pollution was a primary concern, should healthcheck also be included in the silence_logger around_action?

If frequent detailed healthcheck logging is intentional (e.g., to track failures in application logs), this is fine. However, if Docker polls /healthcheck every few seconds, it will create the same log clutter issue that was raised in the PR comments.

Consider applying silence_logger to both endpoints if log pollution is a concern:
-  around_action :silence_logger, only: :liveliness
+  around_action :silence_logger, only: [:liveliness, :healthcheck]
45-51: LGTM! Solid logger silencing implementation.

The use of ensure to restore the logger level is correct, and Logger::UNKNOWN is the appropriate level to suppress all log output. This effectively addresses the log pollution concern raised in the PR discussion.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

app/controllers/health_controller.rb (1)

37-43: Redis check only covers Sidekiq's Redis connection.

The implementation correctly uses Sidekiq.redis(&:ping) (as addressed in previous reviews). However, this only verifies the Redis connection configured for Sidekiq. If your application uses Redis for other purposes—such as caching, sessions, or ActionCable—those connections won't be verified by this health check.

For most applications where Redis is primarily used with Sidekiq, this implementation is sufficient. If you need to check other Redis connections, you may want to add additional checks for those specific Redis clients.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f477ea1 and 0a45abf.

📒 Files selected for processing (1)

app/controllers/health_controller.rb (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci / test

🔇 Additional comments (2)

app/controllers/health_controller.rb (2)

8-9: Consider silencing logs for healthcheck endpoint too.

The silence_logger is currently applied only to the liveliness endpoint. However, based on the Dockerfile changes in this PR, Docker's HEALTHCHECK will hit the /healthcheck endpoint periodically (typically every 30 seconds by default). This may still generate log pollution—the issue that motivated this PR.

If the /healthcheck endpoint is intended for detailed monitoring while /liveliness is for frequent probes, the current setup is appropriate. Otherwise, consider extending log silencing to both endpoints or using the /liveliness endpoint for Docker HEALTHCHECK instead.

Would you like to verify which endpoint Docker HEALTHCHECK should use, or would you prefer to extend silence_logger to both endpoints?

1-52: LGTM! Clean implementation addressing PR objectives.

The implementation provides two well-designed health endpoints:

/healthcheck: Detailed status with database and Redis checks, suitable for comprehensive health monitoring

/liveliness: Lightweight probe with silenced logging, ideal for frequent Kubernetes liveness checks

The code correctly:

Skips authentication and CSRF for unauthenticated health endpoints

Uses proper error handling with logging

Implements safe logger silencing with an ensure block

Addresses previous review feedback (using Sidekiq.redis(&:ping) instead of Redis.new)

This successfully addresses the log pollution concerns raised in issue #269 and the PR discussion.

jjmata · 2025-11-20T19:00:33Z

So I discovered DHH took care of this in Rails 8 ... why don't we wait until we upgrade to it? I was planning on doing it after v0.6.5 if people liked the idea.

coderabbitai bot reviewed Nov 15, 2025

View reviewed changes

compose.example.yml Show resolved Hide resolved

jjmata self-requested a review November 15, 2025 09:02

jjmata assigned xBlaz3kx Nov 15, 2025

jjmata requested changes Nov 15, 2025

View reviewed changes

xBlaz3kx force-pushed the chore/docker-healthcheck branch from 49fbcaf to 226dda2 Compare November 17, 2025 17:46

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

app/controllers/health_controller.rb Show resolved Hide resolved

chore: add docker healthcheck for worker and server

ae41499

xBlaz3kx force-pushed the chore/docker-healthcheck branch from 226dda2 to ae41499 Compare November 18, 2025 22:54

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

app/controllers/health_controller.rb Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

chore: add docker healthcheck for worker and server

0a45abf

xBlaz3kx force-pushed the chore/docker-healthcheck branch from f477ea1 to 0a45abf Compare November 19, 2025 22:10

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

chore: add docker healthcheck for worker and server #332

Are you sure you want to change the base?

chore: add docker healthcheck for worker and server #332

Conversation

xBlaz3kx commented Nov 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjmata commented Nov 15, 2025

Uh oh!

jjmata left a comment

Choose a reason for hiding this comment

Uh oh!

jjmata Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

xBlaz3kx Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

xBlaz3kx Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

xBlaz3kx commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jjmata commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xBlaz3kx commented Nov 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 15, 2025 •

edited

Loading

xBlaz3kx commented Nov 15, 2025 •

edited

Loading