[codex] Document node renderer health probes#3241
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRefines Node Renderer health-check docs: documents an unauthenticated ChangesNode Renderer Health Checks & Kubernetes Probe Documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Documentation-only PROverall: Good, practical addition. The readiness/liveness probe guidance is accurate and the Control Plane deployment table is genuinely useful. A few minor issues below. Issues
Formatting inconsistency: Failure threshold for liveness is vague Minor Suggestions
|
| | ----------------- | ----------------------------------------------- | | ||
| | Scheme | `HTTP` | | ||
| | Path | `/health` or `/info` | | ||
| | Port | The renderer port, usually `3800` or `$PORT` | |
There was a problem hiding this comment.
Formatting inconsistency: 3800 is wrapped in backticks but $PORT is not.
| | Port | The renderer port, usually `3800` or `$PORT` | | |
| | Port | The renderer port, usually `3800` or `$PORT` | |
Should be:
| | Port | The renderer port, usually `3800` or `$PORT` | | |
| | Port | The renderer port, usually `3800` or `$PORT` | |
i.e. `$PORT` instead of plain $PORT.
|
|
||
| ### Configuring Readiness and Liveness Probes | ||
|
|
||
| Use a cheap endpoint such as the `/health` route above for readiness and liveness probes. The built-in `/info` |
There was a problem hiding this comment.
The built-in /info route is referenced here but never documented elsewhere in this file or the broader docs. Before recommending it as a probe endpoint operators should know:
- Is it always present, or only when a custom Fastify setup is used?
- What does its response body look like?
- Is it affected by auth (
passwordconfig)?
If it's truly always built-in, please add a note (or a link to wherever it's defined) so operators can verify it before pointing a liveness probe at it.
| | Initial delay | `5`-`10` seconds, longer if startup is slow | | ||
| | Period | `10` seconds | | ||
| | Timeout | `1`-`5` seconds | | ||
| | Failure threshold | `3` for readiness, higher for cautious liveness | |
There was a problem hiding this comment.
"higher for cautious liveness" is vague. Operators setting this for the first time benefit from a concrete starting point. Consider something like:
| | Failure threshold | `3` for readiness, higher for cautious liveness | | |
| | Failure threshold | `3` for readiness; `5`–`10` for liveness (restart only as a last resort) | |
| | --------------------------------------- | ----------------------------------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | Same Rails container/process supervisor | `http://localhost:3800` | Default `localhost` is fine | Probe the `rails` container's Rails health endpoint, such as `/up` on port `3000`. Add a Rails readiness check for the renderer if SSR is required. | | ||
| | Separate container in the same workload | `http://node-renderer:3800`, using the renderer container name | Usually `0.0.0.0` | Add HTTP readiness and liveness probes to the `node-renderer` container on port `3800`. The renderer port does not need to be the public workload port. | | ||
| | Separate node-renderer workload | `http://node-renderer.<GVC>.cpln.local:3800` or your internal URL | `0.0.0.0` | Add HTTP readiness and liveness probes to the node-renderer workload container on port `3800`. Expose the port internally, not publicly, unless required. | |
There was a problem hiding this comment.
<GVC> is unexplained here. Control Plane users will recognise it, but anyone reading these docs for the first time won't. Consider expanding it inline:
| | Separate node-renderer workload | `http://node-renderer.<GVC>.cpln.local:3800` or your internal URL | `0.0.0.0` | Add HTTP readiness and liveness probes to the node-renderer workload container on port `3800`. Expose the port internally, not publicly, unless required. | | |
| | Separate node-renderer workload | `http://node-renderer.<GVC>.cpln.local:3800` or your internal URL (`<GVC>` is your Control Plane Global Virtual Cloud name) | `0.0.0.0` | Add HTTP readiness and liveness probes to the node-renderer workload container on port `3800`. Expose the port internally, not publicly, unless required. | |
Or add a short footnote/sentence after the table.
Review: Node Renderer Health Probes DocumentationThis is a clean, well-structured documentation addition. The readiness vs liveness distinction is correct and the operational advice (don't couple liveness to external dependencies) is sound. A few issues worth addressing before merging: Issues
Kubernetes users are left without guidance "Combined health endpoint" advice is actionable but under-specified
Minor
|
|
|
||
| | Field | Value | | ||
| | ----------------- | -------------------------------------------- | | ||
| | Scheme | `HTTP` | |
There was a problem hiding this comment.
The env var for the renderer port is RENDERER_PORT (documented above as process.env.RENDERER_PORT || 3800), while $PORT is described in the config docs as the Heroku / Control Plane public port — these are not the same thing. In a sidecar or separate-workload setup, $PORT is the externally-exposed workload port and may differ from the renderer's internal listening port.
Suggest clarifying:
| | Scheme | `HTTP` | | |
| | Port | The renderer port, usually `3800` (`$RENDERER_PORT`). On Heroku or Control Plane, use `$PORT` only if the renderer is configured to listen on that variable. | |
| Use a cheap endpoint such as the `/health` route above for readiness and liveness probes. The health check route should | ||
| return `200 OK` when the process can accept probe traffic. The built-in `/info` route can also serve as a shallow process | ||
| check if you do not need a custom route; it is always registered by the renderer, does not require the renderer password, | ||
| and returns `node_version` and `renderer_version`. |
There was a problem hiding this comment.
Worth clarifying whether /info remains unauthenticated in all environments, including production (where password is required to start the renderer). If /info is always open regardless of the password config, that's a security-relevant detail: it exposes node_version and renderer_version to anyone who can reach the renderer port. Either note that explicitly, or add a security callout similar to the host: 0.0.0.0 warning already in this file.
|
|
||
| Control Plane configures probes per container. When Rails and the renderer share one container, use one combined health | ||
| endpoint if you need to check both processes. When the renderer has its own container or workload, put the renderer probes | ||
| on that container. |
There was a problem hiding this comment.
The advice to "use one combined health endpoint" is correct but under-specified. What should the combined endpoint do? A common pattern is to have the Rails /up (or a custom controller) make a short HTTP call to the renderer's /health or /info and return 503 if it fails. Without at least a brief description of this pattern, users sharing a container won't know how to implement it. Consider adding a sentence like: "For example, add a Rails health check controller that calls http://localhost:3800/health and returns 503 if the renderer is unreachable."
| | Separate container in the same workload | `http://node-renderer:3800`, using the renderer container name | Usually `0.0.0.0` | Add HTTP readiness and liveness probes to the `node-renderer` container on port `3800`. The renderer port does not need to be the public workload port. | | ||
| | Separate node-renderer workload | `http://node-renderer.<GVC>.cpln.local:3800` or your internal URL (`<GVC>` is your Control Plane Global Virtual Cloud name) | `0.0.0.0` | Add HTTP readiness and liveness probes to the node-renderer workload container on port `3800`. Expose the port internally, not publicly, unless required. | | ||
|
|
||
| [Control Plane Flow](https://github.com/shakacode/control-plane-flow)'s default `rails` template models Rails as a |
There was a problem hiding this comment.
The <GVC>.cpln.local internal DNS format is Control Plane-specific and non-obvious. Consider linking to the relevant Control Plane documentation (e.g., the internal DNS or service discovery reference) so users can verify the format and find their GVC name without leaving the renderer docs to hunt for it.
Code ReviewThis is a documentation-only PR adding a "Readiness and Liveness Probes" section to Critical: HTTP/2 incompatibility not mentionedThe renderer uses cleartext HTTP/2 (h2c), which means standard Kubernetes
The new probe table lists Duplication with
|
|
|
||
| | Field | Value | | ||
| | ----------------- | -------------------------------------------- | | ||
| | Scheme | `HTTP` | |
There was a problem hiding this comment.
Breaking guidance for Kubernetes users: The renderer's Fastify server listens on HTTP/2 (h2c — cleartext HTTP/2). Standard Kubernetes httpGet probes speak HTTP/1.1 and will receive a protocol error, causing the probe to fail permanently.
container-deployment.md (line 391) already documents this and recommends either an exec probe with curl --http2-prior-knowledge or a tcpSocket probe as the fallback. This row should be corrected — or at minimum carry a note — to avoid sending users down a broken path:
| | Scheme | `HTTP` | | |
| | Probe type | `tcpSocket` (safe default) or `exec` with `curl --http2-prior-knowledge` (application-level) — **not** `httpGet`; the renderer speaks HTTP/2 (h2c) which is incompatible with HTTP/1.1 httpGet probes | |
| | Deployment shape | Rails `renderer_url` | Renderer `host` | Probe target | | ||
| | --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | Same Rails container/process supervisor | `http://localhost:3800` | Default `localhost` is fine | Probe the `rails` container's Rails health endpoint, such as `/up` on port `3000`. Add a Rails readiness check for the renderer if SSR is required. | | ||
| | Separate container in the same workload | `http://node-renderer:3800`, using the renderer container name | Usually `0.0.0.0` | Add HTTP readiness and liveness probes to the `node-renderer` container on port `3800`. The renderer port does not need to be the public workload port. | |
There was a problem hiding this comment.
The networking shown here conflicts with container-deployment.md.
Containers in the same Control Plane workload share a network namespace (same as Kubernetes pod sidecars), so Rails connects via http://localhost:3800 and the renderer can keep its default localhost binding — just like the single-container case. container-deployment.md line 87 confirms: "Rails still connects via localhost."
Using http://node-renderer:3800 (container-name DNS) and binding 0.0.0.0 is the pattern for separate workloads communicating across a network boundary, not for same-workload sidecars.
If Control Plane's same-workload networking genuinely differs from Kubernetes pods here, a brief clarification callout would prevent confusion — but the current values look like a copy-paste from the separate-workload row.
| The renderer listens with cleartext HTTP/2 (h2c). Kubernetes `httpGet` probes and other HTTP/1.1-only probes are not | ||
| compatible with that listener. Use one of these probe styles instead: | ||
|
|
||
| | Probe style | When to use it | | ||
| | ------------ | ---------------------------------------------------------------------------------------------------------------------------- | | ||
| | `tcpSocket` | Safe default for startup and liveness probes when you only need to know that the renderer port is accepting traffic. | | ||
| | `exec` probe | Application-level readiness check with an h2c-aware client, for example `curl --http2-prior-knowledge`. | | ||
| | HTTP/1.1 | Only if you probe Rails, a separate HTTP/1.1 health sidecar/port, or another endpoint that is not the renderer h2c listener. | | ||
|
|
||
| Recommended starting values: | ||
|
|
||
| | Probe | Starting point | |
There was a problem hiding this comment.
Critical: h2c (HTTP/2 cleartext) incompatibility not mentioned
The renderer serves all its routes — including /health and /info — over cleartext HTTP/2 (h2c). Standard httpGet probes used by Kubernetes (and Control Plane, which uses the same format) speak HTTP/1.1 and are incompatible with h2c listeners. This is explicitly documented in container-deployment.md:
"Because the renderer uses cleartext HTTP/2, Kubernetes
httpGetprobes (HTTP/1.1) are incompatible with this listener. Use a TCP probe, anexecprobe (for example withcurl --http2-prior-knowledge) …"
The Scheme: HTTP row in this table will mislead users into configuring httpGet probes that silently fail at the protocol level. The table needs either:
- A note that
httpGetprobes don't work — useexecwithcurl --http2-prior-knowledgefor application-level checks, ortcpSocketas a lightweight fallback. - A direct cross-reference to
container-deployment.md#startup-errors-err_stream_premature_closewhere the correct YAML examples already live.
| The renderer listens with cleartext HTTP/2 (h2c). Kubernetes `httpGet` probes and other HTTP/1.1-only probes are not | |
| compatible with that listener. Use one of these probe styles instead: | |
| | Probe style | When to use it | | |
| | ------------ | ---------------------------------------------------------------------------------------------------------------------------- | | |
| | `tcpSocket` | Safe default for startup and liveness probes when you only need to know that the renderer port is accepting traffic. | | |
| | `exec` probe | Application-level readiness check with an h2c-aware client, for example `curl --http2-prior-knowledge`. | | |
| | HTTP/1.1 | Only if you probe Rails, a separate HTTP/1.1 health sidecar/port, or another endpoint that is not the renderer h2c listener. | | |
| Recommended starting values: | |
| | Probe | Starting point | | |
| Recommended probe approach: | |
| > **Note:** The renderer serves all routes over cleartext HTTP/2 (h2c). Standard `httpGet` probes speak HTTP/1.1 and will be rejected by the renderer. Use an `exec` probe with `curl --http2-prior-knowledge` for application-level checks, or `tcpSocket` as a minimal fallback. See [Container Deployment: Probe Configuration](./container-deployment.md#startup-errors-err_stream_premature_close) for YAML examples. | |
| | Field | Value | | |
| | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | |
| | Probe type | `exec` (`curl -sf --http2-prior-knowledge http://localhost:3800/info`) for readiness; `tcpSocket` port `3800` for liveness | | |
| | Path | `/health` or `/info` (for `exec` probes only) | | |
| | Port | The renderer port, usually `3800` (`$RENDERER_PORT`). On Heroku or Control Plane, use `$PORT` only if the renderer is configured to listen on that variable. | | |
| | Initial delay | `5`-`10` seconds, longer if startup is slow | | |
| | Period | `5` seconds for readiness; `10` seconds for liveness | | |
| | Timeout | `1`-`5` seconds | | |
| | Failure threshold | `3` for readiness; `5`-`10` for liveness | | |
| | Success threshold | `1` | |
| - **Liveness** answers whether the renderer is stuck badly enough that restarting the container is safer. | ||
|
|
||
| Do not put Rails, database, Redis, or other external dependency checks in the node-renderer's liveness probe. A | ||
| temporary dependency outage should not restart every renderer replica. If SSR must be available before Rails receives |
There was a problem hiding this comment.
Missing: startup probe recommendation
container-deployment.md explicitly recommends a startup probe alongside readiness and liveness probes, noting it's the primary way to handle ERR_STREAM_PREMATURE_CLOSE errors during slow container startup. This section only mentions two probe types — worth adding a brief note:
Also configure a startup probe (separate from the liveness probe) with a generous
initialDelaySecondsto handle slow renderer startup without triggering premature liveness restarts. See Container Deployment for a YAML example.
| | Liveness | `tcpSocket` on the renderer port, `periodSeconds: 10`, and `failureThreshold` set to `5`-`10`. | | ||
|
|
||
| See [Node Renderer: Container Deployment](./container-deployment.md#startup-errors-err_stream_premature_close) for full | ||
| Kubernetes YAML examples, including startup, readiness, and liveness probes. | ||
|
|
There was a problem hiding this comment.
The "Probe target" column recommends HTTP probes on port 3800 for the separate-container and separate-workload rows, but doesn't mention the h2c constraint. A reader following this table will likely configure httpGet probes that fail silently.
Suggest adding a footnote or inline note like: "Use exec + curl --http2-prior-knowledge or tcpSocket — standard httpGet probes are HTTP/1.1 and are rejected by the h2c listener."
Code ReviewOverview: Documentation-only PR adding a "Configuring Readiness and Liveness Probes" section to 🔴 Critical — h2c (HTTP/2 cleartext) incompatibilityThe renderer serves all its routes — including the custom
The new probe table's
The Control Plane deployment table has the same gap — the "Probe target" cells recommend HTTP probes without mentioning h2c. 🟡 Significant — startup probe not mentioned
🟡 Significant — duplication without cross-reference
🟢 Minor
What's good
The Control Plane table and the probe-semantics prose are worth keeping — they just need the h2c caveat added and a cross-reference to the YAML examples in |
Code ReviewThis is a well-structured documentation addition. The guidance is technically accurate overall, the table layout is clear, and the placement within IssuesLiveness probe
Readiness probe is missing the curl HTTP/2 image prerequisite
Minor observations
|
| Recommended starting values: | ||
|
|
||
| | Probe | Starting point | | ||
| | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
There was a problem hiding this comment.
The failureThreshold range here (5–10) conflicts with container-deployment.md, which documents failureThreshold: 3 for the liveness probe in both the prose YAML snippet (line ~417) and the sidecar manifest (line ~527). Having two different values in the same docs set will confuse readers.
Suggestion: align to the single value used in container-deployment.md (currently 3), or update that file at the same time and explain in prose why the liveness threshold should be more generous than the readiness threshold.
| | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | |
| | Liveness | `tcpSocket` on the renderer port, `periodSeconds: 10`, and `failureThreshold: 3`. This matches the example in [Container Deployment](./container-deployment.md#startup-errors-err_stream_premature_close). Increase only if your environment has slow storage or frequent transient pauses. | |
|
|
||
| Recommended starting values: | ||
|
|
||
| | Probe | Starting point | |
There was a problem hiding this comment.
The curl --http2-prior-knowledge command requires a curl build compiled with HTTP/2 support, which is not present in all base images. container-deployment.md surfaces this with: "The exec probe requires curl with HTTP/2 support in your image. Verify with curl --version | grep HTTP2. If curl is unavailable, use tcpSocket as a fallback."
Without that note here, a reader who copies this table value verbatim may get a silently broken readiness probe.
| | Probe | Starting point | | |
| | Readiness | `exec` with `curl -sf --http2-prior-knowledge http://localhost:3800/info`, `timeoutSeconds: 5`, `periodSeconds: 5`, and `failureThreshold: 3`. Requires curl with HTTP/2 support (`curl --version \| grep HTTP2`); use `tcpSocket` as a fallback if unavailable. | |
Code Review: Documentation-only PRSummary: Adds a "Configuring Readiness and Liveness Probes" section to the node-renderer JS configuration docs. The content is accurate, well-structured, and fills a real gap. A few issues need to be addressed before merging. Bugs / InconsistenciesLiveness probe The new table recommends Missing curl HTTP/2 support caveat The readiness probe table recommends Design / StructureSection lives under The new H3 is nested inside Risk of docs drift from duplicated values The probe recommendations here (startup Minor
|
| | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | Startup | `tcpSocket` on the renderer port, usually `3800` or `$RENDERER_PORT`; use `$PORT` only if the renderer listens there. Use `initialDelaySeconds: 10`, `periodSeconds: 5`, and `failureThreshold: 6` as a starting point. | | ||
| | Readiness | `exec` with `curl -sf --http2-prior-knowledge http://localhost:3800/info`, `timeoutSeconds: 5`, `periodSeconds: 5`, and `failureThreshold: 3`. | | ||
| | Liveness | `tcpSocket` on the renderer port, `periodSeconds: 10`, and `failureThreshold` set to `5`-`10`. | |
There was a problem hiding this comment.
The failureThreshold: 5–10 range here conflicts with container-deployment.md, where the Kubernetes Sidecar Manifest YAML example uses failureThreshold: 3 for liveness. A reader who consults both docs will see two different values with no explanation. Please align the two files — either update the YAML in container-deployment.md or change this recommendation to 3, and add a note if the wider range is intentional (e.g., "increase from 3 if the renderer recovers from transient hangs").
| > timeoutSeconds: 1 | ||
| > periodSeconds: 10 | ||
| > failureThreshold: 3 | ||
| > ``` |
There was a problem hiding this comment.
The liveness fallback block is placed after the manifest's closing fence, making it easy to scroll past after copying the YAML. Users who don't read the callout and fallback will get the exec probe (which fails silently if curl lacks HTTP/2 support). Consider whether the fallback block would be better positioned immediately above the manifest — alongside the [!WARNING] — so it's seen before copying rather than after.
|
|
||
| | Probe style | When to use it | | ||
| | ------------ | ------------------------------------------------------------------------------------------------------------------------------ | | ||
| | `tcpSocket` | Startup checks and fallback readiness or liveness checks when curl with HTTP/2 support is unavailable. | |
There was a problem hiding this comment.
The prose throughout the section recommends exec first and tcpSocket as a fallback, but the table puts tcpSocket first. Reordering to match the recommendation order (exec, then tcpSocket, then HTTP/1.1) would be less confusing for readers who skim tables before prose.
| | `tcpSocket` | Startup checks and fallback readiness or liveness checks when curl with HTTP/2 support is unavailable. | | |
| | Probe style | When to use it | | |
| | ------------ | ------------------------------------------------------------------------------------------------------------------------------ | | |
| | `exec` probe | Application-level readiness and stricter liveness checks with an h2c-aware client, for example `curl --http2-prior-knowledge`. | | |
| | `tcpSocket` | Startup checks and fallback readiness or liveness checks when curl with HTTP/2 support is unavailable. | | |
| | HTTP/1.1 | Only if you probe Rails, a separate HTTP/1.1 health sidecar/port, or another endpoint that is not the renderer h2c listener. | |
Code ReviewOverviewThis PR adds comprehensive documentation for configuring Node Renderer health probes (startup, readiness, liveness) in Docker Compose, Kubernetes, and Control Plane environments. It also documents the built-in unauthenticated What Works Well
Issues to Address1. Missing link between
|
| also passes `request` and `reply` to handlers if you need to inspect headers, set status codes, or customize the | ||
| response. Add warm-up or readiness-gate logic inside this handler if readiness should wait for renderer-specific | ||
| initialization. To signal not-ready, add `reply` to the handler parameters and call | ||
| `reply.code(503).send({ status: 'warming_up' })`. Kubernetes exec probes treat any non-zero curl exit code as a failure; |
There was a problem hiding this comment.
The statement that "exec probes treat any non-zero curl exit code as a failure" is correct, but the mechanism that makes a 503 response actually produce a non-zero exit code is the -f flag in curl -sf. Without -f, curl exits 0 regardless of the HTTP status code.
Consider adding one sentence to close the loop for readers who might not know this:
| `reply.code(503).send({ status: 'warming_up' })`. Kubernetes exec probes treat any non-zero curl exit code as a failure; | |
| `reply.code(503).send({ status: 'warming_up' })`. The `-f` flag in `curl -sf` causes curl to exit non-zero for HTTP 4xx/5xx responses (exit 22), so a `503` from this handler correctly fails the probe. Kubernetes exec probes treat any non-zero curl exit code as a failure; |
| # cluster workers are ready. Pair with the startup probe to shield liveness. | ||
| # connect_timeout is supported by the Ruby versions in this guide's prerequisites. | ||
| renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port | ||
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} |
There was a problem hiding this comment.
The TCP target is hardcoded to "localhost" rather than extracted from renderer_url. For the "Same Rails Container" scenario this is always correct (both processes share a network namespace), but a reader who copy-pastes this controller into a different topology (e.g. after having set renderer_url to a Docker Compose service name like http://renderer:3800) would silently probe the wrong host — they'd get :ok when the local port happens to be listening for something else.
A short inline comment would prevent this confusion:
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} | |
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} # Same container: localhost is always correct here. |
| > [!NOTE] | ||
| > **Probe command notes:** `exec` probes require curl with HTTP/2 support in your image. Verify with | ||
| > `curl --version | grep -i http2`; if unavailable, use `tcpSocket` as a fallback. Set curl `--max-time` shorter than the | ||
| > orchestrator timeout so curl returns a clean non-zero exit code before Kubernetes terminates the probe process. These | ||
| > examples use `--max-time 3` with `timeoutSeconds: 5`, leaving a 2-second buffer. Readiness and liveness omit | ||
| > `initialDelaySeconds` because Kubernetes 1.20+ (startup probe GA) defers | ||
| > them until the startup probe succeeds. If you skip the startup probe or run an older cluster without startup probe | ||
| > support, add an appropriate `initialDelaySeconds`. | ||
|
|
||
| > **Security:** `/info` is unauthenticated even when `password` is configured. Keep the renderer on `localhost` or | ||
| > private networking if exposing node and renderer version details is a concern; see | ||
| > [Built-in Endpoints](./js-configuration.md#built-in-endpoints). |
There was a problem hiding this comment.
Placing the > [!NOTE] callout and the security blockquote between **Mitigation:** and the numbered list visually disconnects the list from its heading in some Markdown renderers (and in raw-text reading). The numbered items look like they follow the Note's content rather than the "Mitigation" section.
Consider moving these two callouts after item 4 (or at the end of the section) so the numbered list flows directly under "Mitigation:". This is non-blocking but would improve scannability for operators copy-pasting the probe YAML.
| # In this same-container topology, Rails and the renderer share a network namespace. | ||
| # Probe localhost even if other deployment shapes use a service host. | ||
| # connect_timeout is supported by the Ruby versions in this guide's prerequisites. | ||
| renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port |
There was a problem hiding this comment.
Silent wrong-port bug when renderer_url lacks an explicit port.
URI.parse("http://localhost").port returns 80 (the URI default for HTTP), not nil. If a user configures renderer_url as http://localhost (no :3800), the TCP check silently probes port 80 instead of the renderer — returning 200 if anything listens on port 80, 503 if nothing does. Neither URI::Error nor TypeError is raised, so the rescue doesn't catch this case.
Consider either validating the port or falling back explicitly:
| renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port | |
| renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port | |
| raise URI::InvalidURIError, "renderer_url must include an explicit port (got: #{ReactOnRailsPro.configuration.renderer_url})" if renderer_port == URI::DEFAULT_PARSER.send(:default_port, "http") |
Or more pragmatically:
renderer_uri = URI.parse(ReactOnRailsPro.configuration.renderer_url)
renderer_port = renderer_uri.port if renderer_uri.to_s.match?(/:\d+/)
renderer_port ||= 3800 # fall back to renderer defaultAt minimum, a prose note alongside this snippet warning that renderer_url must include the port would prevent silent misconfiguration.
| also passes `request` and `reply` to handlers if you need to inspect headers, set status codes, or customize the | ||
| response. Add warm-up or readiness-gate logic inside this handler if readiness should wait for renderer-specific | ||
| initialization. To signal not-ready, add `reply` to the handler parameters and call | ||
| `reply.code(503).send({ status: 'warming_up' })`. The `-f` flag in `curl -sf` causes curl to exit non-zero for HTTP |
There was a problem hiding this comment.
Mixing bare-return and reply.send() patterns in the same handler needs a clarification.
The example handler uses return { status: 'ok' } (Fastify auto-serializes), but the prose immediately after says "add reply to the handler parameters and call reply.code(503).send(...) ". If a reader adds that call while keeping return { status: 'ok' }, Fastify will warn about a double-send (the returned object is still serialized after reply.send() returns).
Suggest clarifying that switching to reply requires dropping the bare return:
| `reply.code(503).send({ status: 'warming_up' })`. The `-f` flag in `curl -sf` causes curl to exit non-zero for HTTP | |
| `reply.code(503).send({ status: 'warming_up' })` and removing the bare `return` (or switching to `return reply.code(503).send(...)`). The `-f` flag in `curl -sf` causes curl to exit non-zero for HTTP |
Or add a complete conditional example:
app.get('/health', async (_request, reply) => {
if (!isWarmedUp()) {
return reply.code(503).send({ status: 'warming_up' });
}
return { status: 'ok' };
});| `config/routes.rb`: | ||
|
|
||
| ```ruby | ||
| # Override Rails 7.1+'s built-in /up route to add the renderer TCP check. |
There was a problem hiding this comment.
/up route override may silently break an existing custom health endpoint.
If a Rails app already defines a custom GET /up action (common for Rails < 7.1 or apps that added their own), this get "up", to: "health#show" in routes.rb will override it without warning. Consider noting that the user should check for an existing /up route before adding this, or suggest using a distinct path (e.g., /health or /healthz) to avoid the collision:
| # Override Rails 7.1+'s built-in /up route to add the renderer TCP check. | |
| # Override Rails 7.1+'s built-in /up route to add the renderer TCP check. | |
| # If you already have a custom /up or are on Rails < 7.1, use a distinct | |
| # path such as /healthz to avoid silently replacing existing health logic. | |
| get "up", to: "health#show" |
| > The 10-second initial delay is a conservative starting point for images that take a moment to boot before opening the | ||
| > port; reduce it, or omit it, if your renderer consistently starts in under 5 seconds. | ||
|
|
||
| Readiness and liveness omit `initialDelaySeconds` here because Kubernetes 1.20+ (startup probe GA) defers them until |
There was a problem hiding this comment.
initialDelaySeconds: 10 note is slightly misleading for the startup probe.
The note says "reduce it, or omit it, if your renderer consistently starts in under 5 seconds" — but this is the startup probe's own initialDelaySeconds. Setting it to 0 (or omitting it, which defaults to 0 in Kubernetes) means checks begin immediately and the 30-second grace window still exists via failureThreshold: 6 × periodSeconds: 5. The note could be clearer that removing initialDelaySeconds doesn't reduce the total window, just shifts when the first check fires:
The 10-second initial delay means the first check fires at 10 s. Omitting it (Kubernetes defaults to 0) starts checks immediately — the total allowed startup window stays the same (
failureThreshold × periodSeconds = 30 s). ReduceinitialDelaySecondsif your renderer reliably opens the port within 1–2 seconds; keep it to avoid noisy early-failure log entries.
Code Review — PR #3241: Document node renderer health probesOverall: This is a well-structured, thorough documentation addition. The technical content is accurate, the probe-timing rationale ( Four issues worth addressing before merge: 1. Silent wrong-port in
|
Code Review: PR #3241 — Document node renderer health probesDocumentation-only PR that adds solid, actionable guidance. The content is technically accurate and fills a real gap. A few issues worth addressing before merge. What this PR does well
Issues to address1. If 2. The regex 3. Minor — repeated curl HTTP/2 verification note
4. Minor — The No issues with
|
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | ||
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} | ||
| head :ok | ||
| rescue SocketError, SystemCallError, URI::Error, TypeError |
There was a problem hiding this comment.
The rescue clause misses NoMethodError. If ReactOnRailsPro.configuration.renderer_url returns nil, the call to renderer_url.match?(...) on the line above raises NoMethodError — a subclass of NameError, not of SocketError, SystemCallError, URI::Error, or TypeError. The endpoint would respond with 500 instead of the intended 503.
Add a nil guard before the URI parse, or widen the rescue:
| rescue SocketError, SystemCallError, URI::Error, TypeError | |
| rescue SocketError, SystemCallError, URI::Error, TypeError, ArgumentError |
Or, more defensively, guard the nil case explicitly:
renderer_url = ReactOnRailsPro.configuration.renderer_url
raise ArgumentError, "renderer_url not configured" if renderer_url.nil?Either approach prevents an unhandled exception from masking a misconfiguration as a 500.
| # connect_timeout is supported by the Ruby versions in this guide's prerequisites. | ||
| renderer_url = ReactOnRailsPro.configuration.renderer_url | ||
| renderer_uri = URI.parse(renderer_url) | ||
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 |
There was a problem hiding this comment.
The regex re-parses the URL string to detect whether an explicit port was set, because URI.parse("http://localhost").port returns 80 (HTTP default) rather than nil. A cleaner way to express the same intent:
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | |
| renderer_port = renderer_uri.port if renderer_url.to_s.match?(/:\d+/) | |
| renderer_port ||= 3800 |
Or even more explicitly, relying on the fact that the Node Renderer default (3800) differs from every scheme default:
renderer_port = [80, 443].include?(renderer_uri.port) && !renderer_url.to_s.include?(":#{renderer_uri.port}") ? 3800 : renderer_uri.portThe current regex (/:\d+(?:[\/?#]|$)/) is correct in practice, but its intent (distinguishing "no port in URL" from "port 80 in URL") is not immediately obvious. A brief inline comment on that line explaining why a regex is needed (rather than just calling .port) would help future readers.
| raise ArgumentError, "renderer_url not configured" if renderer_url.nil? || renderer_url.empty? | ||
|
|
||
| renderer_uri = URI.parse(renderer_url) | ||
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 |
There was a problem hiding this comment.
The port-detection regex is fragile. URI.parse("http://localhost").port returns 80 (HTTP default), not nil, so you can compare against URI::HTTP.default_port instead of pattern-matching the raw string:
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | |
| renderer_port = (renderer_uri.port == renderer_uri.class.default_port) ? 3800 : renderer_uri.port |
This is clearer and correctly handles edge cases like http://renderer:80 (which the regex would misclassify as "has an explicit port == 3800").
There's also a subtle issue: if someone passes https://…, renderer_uri.class.default_port is 443, so the fallback would kick in unintentionally. Since the renderer only speaks h2c (plain HTTP), you could add an explicit scheme guard or just use 3800 as the hardcoded fallback and document the assumption.
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | ||
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} | ||
| head :ok | ||
| rescue ArgumentError, SocketError, SystemCallError, URI::Error, TypeError |
There was a problem hiding this comment.
Rescuing ArgumentError alongside connection errors means a misconfigured renderer_url (the raise on line 185) silently returns 503 in production — indistinguishable from a genuinely unreachable renderer. The configuration error will keep returning 503 on every probe and causing cascading availability alerts without surfacing the real cause.
Consider re-raising (or logging + re-raising) ArgumentError so it surfaces as a 500 rather than a 503:
rescue ArgumentError
raise # surface misconfiguration as 500, not a health-check failure
rescue SocketError, SystemCallError, URI::Error, TypeError
head :service_unavailable
endThis keeps "renderer is down" vs "deployment is misconfigured" distinguishable in logs and alerting.
|
|
||
| renderer_uri = URI.parse(renderer_url) | ||
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | ||
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} |
There was a problem hiding this comment.
The hardcoded "localhost" here is intentional for the same-container topology, but this is subtle and easy to miss for readers adapting the snippet to other shapes. The renderer_url is used only for the port, while the host is silently overridden.
Worth adding a brief inline callout (or a blockquote below the snippet) like:
Topology-specific: This code always probes
localhostregardless ofrenderer_url's host — it only borrows the port. This is correct for same-container deployments where Rails and the renderer share a network namespace. Do not reuse this snippet as-is for sidecar or separate-workload topologies whererenderer_urlpoints to a different host.
| - **Liveness**: Prefer `exec` with | ||
| `curl -sf --max-time 3 --http2-prior-knowledge http://localhost:3800/info` when curl with HTTP/2 support is available. | ||
| If you are upgrading from a `tcpSocket` liveness probe, verify curl has HTTP/2 support in the image before switching. | ||
| Start with `timeoutSeconds: 5`, `periodSeconds: 10`, and `failureThreshold: 3`, matching the Container Deployment | ||
| examples. Use `tcpSocket` only if curl is unavailable. Raise `failureThreshold`, and optionally `periodSeconds`, if | ||
| heavy CPU bursts or frequent transient pauses trigger false-positive restarts. |
There was a problem hiding this comment.
The liveness probe recommendation (exec curl /info) is more aggressive than is often appropriate. A liveness probe failure restarts the container, while a readiness failure only removes it from the load-balancer rotation. Under sustained CPU/GC pressure, an h2c round-trip can time out even when the process is functional — triggering a restart that makes things worse (the "liveness probe death spiral" pattern).
The recommendation and the table already mention tcpSocket as a fallback, but the prose buries it with "only if curl is unavailable." It's worth flipping the default recommendation: tcpSocket for liveness (cheap, restarts only on hard failures), exec curl /info for readiness (application-level gatekeeping). The current guidance was changed from tcpSocket liveness → exec liveness, which reverses the safer default.
Code ReviewThis is a documentation-only PR that adds health probe guidance for the Node Renderer. The content is thorough and the security callouts (unauthenticated Code correctness issues in the
|
|
Code Review - PR 3241 (Node renderer health probes docs) OVERVIEW: Documentation-only PR adding comprehensive health probe guidance for the Node Renderer. Technical content is accurate: three-tier probe strategy (startup/readiness/liveness), h2c incompatibility with httpGet probes, the --max-time / timeoutSeconds relationship, and Control Plane topology shapes are all correct. WHAT WORKS WELL:
ISSUE 1 - Unrescued URI::InvalidURIError in HealthController example: ISSUE 2 - Port-detection regex is fragile: ISSUE 3 - Content duplication: ISSUE 4 - Backwards initialDelaySeconds comment phrasing: VERDICT: Content is technically sound and a genuine improvement. The main fix worth making before merge is clarifying the URI::InvalidURIError behaviour in the HealthController example. Regex and phrasing issues are nice-to-haves. |
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | ||
| Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} | ||
| head :ok | ||
| rescue SocketError, SystemCallError |
There was a problem hiding this comment.
The rescue clause catches SocketError and SystemCallError but not URI::InvalidURIError, which URI.parse raises for a malformed URL. A non-empty but syntactically invalid renderer_url (e.g. "renderer:not-a-port") would therefore raise an unhandled exception and produce a 500, which differs from the "reachability failures become 503, config errors become 500" contract described in the callout below.
If the intent is for all config errors to surface as 500s, add a comment here making that explicit:
| rescue SocketError, SystemCallError | |
| rescue SocketError, SystemCallError | |
| # URI::InvalidURIError (malformed renderer_url) is intentionally not rescued here; | |
| # it surfaces as a 500 so misconfiguration is visible in logs and alerting. | |
| head :service_unavailable |
| renderer_uri = URI.parse(renderer_url) | ||
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 |
There was a problem hiding this comment.
The regex approach is correct for typical renderer URLs but is hard to reason about at a glance. Since URI.parse already ran, you can compare renderer_uri.port against the HTTP/HTTPS scheme defaults directly, which is more self-documenting and handles an unrecognised scheme visibly:
| renderer_uri = URI.parse(renderer_url) | |
| renderer_port = renderer_url.match?(/:\d+(?:[\/?#]|$)/) ? renderer_uri.port : 3800 | |
| renderer_uri = URI.parse(renderer_url) | |
| default_scheme_port = { 'http' => 80, 'https' => 443 }.fetch(renderer_uri.scheme, nil) | |
| renderer_port = renderer_uri.port != default_scheme_port ? renderer_uri.port : 3800 |
| failureThreshold: 6 | ||
| timeoutSeconds: 1 | ||
| readinessProbe: | ||
| # Omit initialDelaySeconds only if the startupProbe above is configured. |
There was a problem hiding this comment.
The phrasing "Omit … only if" reads as a restriction (when it is safe to leave it out) rather than a trigger (when it is required). Readers who skip the startup probe may not notice they need to add initialDelaySeconds. Suggested rewording:
| # Omit initialDelaySeconds only if the startupProbe above is configured. | |
| # Add initialDelaySeconds here if no startupProbe is configured. | |
| # Kubernetes 1.20+ defers readiness/liveness until the startup probe succeeds. |
| periodSeconds: 5 | ||
| failureThreshold: 3 | ||
| livenessProbe: | ||
| # Omit initialDelaySeconds only if the startupProbe above is configured. |
There was a problem hiding this comment.
Same phrasing issue as the readiness probe comment above — "Omit … only if" reads as a restriction rather than a trigger. Suggest the same rewording:
| # Omit initialDelaySeconds only if the startupProbe above is configured. | |
| # Add initialDelaySeconds here if no startupProbe is configured. | |
| # Kubernetes 1.20+ defers readiness/liveness until the startup probe succeeds. |
| app.get('/health', (request, reply) => { | ||
| reply.send({ status: 'ok' }); | ||
| app.get('/health', () => { | ||
| // Return a Promise or use async/await if warm-up checks involve async operations. |
There was a problem hiding this comment.
Minor: the comment says "Return a Promise or use async/await" but these are the same thing — an async function always returns a Promise. The useful distinction is between returning a plain object (synchronous, as shown here) vs. returning a Promise / using async/await for async warm-up checks:
| // Return a Promise or use async/await if warm-up checks involve async operations. | |
| // For async warm-up checks, use an async handler: async () => { await ...; return { status: 'ok' }; } |
Summary
Adds node renderer readiness and liveness probe guidance to the OSS node renderer JavaScript configuration docs.
The new section covers:
/healthor/infofor shallow HTTP probesValidation
pnpm --dir /Users/justin/codex/react_on_rails exec prettier --check /Users/justin/codex/react_on_rails-pr-probes/docs/oss/building-features/node-renderer/js-configuration.mdNote
Low Risk
Low risk documentation-only change; no runtime code paths are modified. Main risk is users misconfiguring probes/host binding based on the new guidance.
Overview
Adds detailed guidance for configuring Node Renderer startup/readiness/liveness probes in container environments, including recommended defaults (h2c-aware
execreadiness viacurl --http2-prior-knowledge,tcpSocketliveness) and timing/timeout advice (e.g.,--max-timebuffers,timeoutSeconds: 1for TCP probes).Documents the unauthenticated built-in
GET /infoendpoint (version exposure) vs a custom/healthroute, and expands deployment-shape guidance for Control Plane (same Rails container, sidecar container, separate renderer workload) with correspondingrenderer_url, host binding, and probe targeting recommendations. Updates Compose/Kubernetes examples accordingly.Reviewed by Cursor Bugbot for commit d03f3df. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit