Skip to content
Open
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
497a18a
Document node renderer health probes
justin808 May 3, 2026
37d2ffc
Address node renderer probe review comments
justin808 May 4, 2026
9e8ad39
Clarify Control Plane GVC probe URL
justin808 May 4, 2026
585f916
Clarify node renderer probe guidance
justin808 May 4, 2026
2784693
Correct node renderer probe guidance
justin808 May 4, 2026
27795d8
Link Control Plane service DNS guidance
justin808 May 4, 2026
0a8bb35
Emphasize h2c probe requirements
justin808 May 4, 2026
a1df2bc
Align probe thresholds with container docs
justin808 May 4, 2026
b0235c8
Inline probe caveats in starting values
justin808 May 4, 2026
a6499ed
Refine node renderer probe docs
justin808 May 4, 2026
1454d3e
Mention startup probes in renderer docs
justin808 May 4, 2026
84c7a3d
Clarify renderer probe setup scope
justin808 May 4, 2026
1ebe204
Align probe examples with Kubernetes exec semantics
justin808 May 4, 2026
0a1c3bc
Tune readiness probe example
justin808 May 4, 2026
f43269d
Finalize renderer probe examples
justin808 May 4, 2026
0464273
Keep node renderer probe docs consistent
justin808 May 4, 2026
372da23
Polish node renderer probe docs
justin808 May 4, 2026
402ebcd
Tighten health endpoint wording
justin808 May 4, 2026
db58028
Clarify startup probe dependency
justin808 May 4, 2026
eee5047
Address renderer probe review followups
justin808 May 4, 2026
42334e6
Use explicit health endpoint anchor
justin808 May 4, 2026
423d336
Align probe wording with recommendations
justin808 May 4, 2026
b7169ce
Clarify startup probe gating
justin808 May 4, 2026
6f84b4e
Explain probe readiness gating
justin808 May 4, 2026
97511ba
Document built-in renderer info endpoint
justin808 May 4, 2026
0c75acf
Cross-link renderer info security note
justin808 May 4, 2026
18a0cb2
Refine renderer info probe notes
justin808 May 4, 2026
c3ce264
Use canonical renderer source link
justin808 May 4, 2026
3ffbf7c
Align renderer probe guidance wording
justin808 May 4, 2026
cd54398
Clarify renderer probe caveats
justin808 May 4, 2026
d03d191
Address final renderer probe review notes
justin808 May 5, 2026
b7547a6
Clarify renderer probe docs for public readers
justin808 May 5, 2026
b414b71
Correct same workload probe host guidance
justin808 May 5, 2026
5d26c67
Document renderer probe edge cases
justin808 May 5, 2026
e9849c2
Clarify renderer probe tuning notes
justin808 May 5, 2026
c4f1645
Fix readiness probe YAML indentation
justin808 May 5, 2026
950e22c
Polish renderer probe documentation notes
justin808 May 5, 2026
c064a19
Align readiness probe YAML indentation
justin808 May 5, 2026
c9b53e5
Clarify custom renderer probe auth
justin808 May 5, 2026
2cfebbb
Polish renderer probe review notes
justin808 May 5, 2026
0d17e5b
Clarify node renderer probe guidance
justin808 May 5, 2026
dbd13cb
Clarify final renderer probe review notes
justin808 May 5, 2026
faab393
Polish node renderer probe notes
justin808 May 5, 2026
0c7b507
Clarify renderer probe startup guidance
justin808 May 5, 2026
ae3764e
Improve renderer probe guidance readability
justin808 May 5, 2026
9ffa0bf
Refine renderer probe defaults
justin808 May 5, 2026
4e3b683
Polish renderer probe notes
justin808 May 5, 2026
f39a8c0
Clarify renderer liveness probe requirements
justin808 May 5, 2026
8edb703
Document liveness probe migration caveats
justin808 May 5, 2026
4548bf1
Polish renderer probe review followups
justin808 May 5, 2026
449931e
Clarify renderer probe upgrade caveats
justin808 May 5, 2026
9065b1f
Polish renderer probe warning notes
justin808 May 5, 2026
cfd9a8d
Move Control Plane probe guidance
justin808 May 5, 2026
f138857
Consolidate renderer probe notes
justin808 May 5, 2026
6d19040
Refine renderer probe fallback docs
justin808 May 5, 2026
1050661
Clarify renderer probe fallback security
justin808 May 5, 2026
5e22ec5
Polish renderer probe docs
justin808 May 5, 2026
eca7b83
Clarify renderer probe deployment notes
justin808 May 5, 2026
5ad9e6f
Clarify renderer liveness probe defaults
justin808 May 5, 2026
549863b
Polish renderer probe review comments
justin808 May 5, 2026
120183c
Address renderer probe documentation polish
justin808 May 5, 2026
f49df1e
Clarify Control Plane probe snippets
justin808 May 5, 2026
ae73c7e
Clarify health probe examples
justin808 May 5, 2026
6ff4f43
Clarify renderer health probe endpoints
justin808 May 5, 2026
e48a3ca
Use copy-paste safe readiness endpoint
justin808 May 5, 2026
7ebde39
Clarify probe timing and port notes
justin808 May 5, 2026
3e68181
Polish health probe examples
justin808 May 5, 2026
dc34dc9
Clarify liveness upgrade guidance
justin808 May 5, 2026
7c29f56
Refine probe edge case guidance
justin808 May 5, 2026
29a62fa
Fix probe snippet indentation
justin808 May 5, 2026
e740956
Derive Rails health probe port from config
justin808 May 5, 2026
5c0fa3e
Address renderer probe review followups
justin808 May 5, 2026
bc201b3
Tighten renderer probe review notes
justin808 May 5, 2026
25fde2e
Polish renderer probe review feedback
justin808 May 5, 2026
f9ea8c9
Clarify renderer probe failure semantics
justin808 May 5, 2026
fa38423
Harden renderer probe examples
justin808 May 5, 2026
f037560
Guard renderer health probe config
justin808 May 5, 2026
d03f3df
Clarify renderer probe liveness defaults
justin808 May 8, 2026
e8a88a5
Clarify renderer probe guidance followups
justin808 May 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 198 additions & 8 deletions docs/oss/building-features/node-renderer/container-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,91 @@ end

> **Recommendation:** Start with a single container. Move to sidecar containers if you need per-process memory/CPU visibility (e.g., to diagnose OOM restarts). Separate workloads are rarely justified unless you have a specific need for independent scaling at high replica counts.

## Control Plane Deployment Shapes

For Control Plane deployments, choose the probe target based on where the node renderer runs. Control Plane configures
probes per container. Renderer probe targets below mean `tcpSocket` or h2c-aware `exec` probes, not HTTP/1.1 `httpGet`
probes directly against the renderer.

[Control Plane Flow](https://github.com/shakacode/control-plane-flow)'s default `rails` template models Rails as a
single-container standard workload. If you follow that template and run the renderer inside the Rails container,
configure the Rails workload's probes rather than looking for a separate node-renderer container. If you split the
renderer into its own container or workload, add renderer-specific probes there.

### Same Rails Container Or Process Supervisor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading mentions "Process Supervisor" (Foreman, Overmind, Honcho) but the body and code example only cover the single-Rails-container case. A reader using a process supervisor to co-locate Rails and the renderer in the same container will have the same localhost network topology, but the heading implies broader coverage that the section doesn't deliver.

Suggest narrowing the heading:

Suggested change
### Same Rails Container Or Process Supervisor
### Same Container (Rails and Renderer Co-Located)

And adding a sentence in the body: "If you use a process supervisor such as Foreman or Overmind inside the same container, the network topology is identical — both processes share localhost."


Set the Rails `renderer_url` to `http://localhost:3800`. The renderer can keep the default `localhost` host binding.
Probe the `rails` container's Rails health endpoint, such as `/up` on port `3000` in Rails 7.1+ or a custom endpoint in
earlier Rails versions.

When Rails and the renderer share one container, use one combined Rails health endpoint if you need to check both
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This advice is correct but abstract — users familiar with Kubernetes probes but not Rails may not know how to implement "a short TCP connection check" inside a Rails health endpoint. A minimal example would make this actionable:

# config/initializers/health_check.rb (Rails 7.1+ with built-in /up)
# Override or supplement /up with a renderer liveness check
Rails.application.routes.draw do
  get "/up", to: proc { |env|
    begin
      TCPSocket.new("localhost", 3800).close
      [200, {}, ["OK"]]
    rescue Errno::ECONNREFUSED, Errno::ETIMEDOUT
      [503, {}, ["renderer unavailable"]]
    end
  }
end

Or at minimum, link to the Rails health controller docs so readers know where to hook this in.

processes. For example, make the Rails readiness endpoint perform a short TCP connection check to `localhost:3800` and
return `503` if the renderer is unreachable.

Because this guide covers React on Rails Pro's Node Renderer, the Rails endpoint below reads the same
`ReactOnRailsPro.configuration.renderer_url` value used for SSR requests rather than requiring a second port environment
variable.

`config/routes.rb`:

```ruby
# Override Rails 7.1+'s built-in /up route to add the renderer TCP check.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/up route override may silently break an existing custom health endpoint.

If a Rails app already defines a custom GET /up action (common for Rails < 7.1 or apps that added their own), this get "up", to: "health#show" in routes.rb will override it without warning. Consider noting that the user should check for an existing /up route before adding this, or suggest using a distinct path (e.g., /health or /healthz) to avoid the collision:

Suggested change
# Override Rails 7.1+'s built-in /up route to add the renderer TCP check.
# Override Rails 7.1+'s built-in /up route to add the renderer TCP check.
# If you already have a custom /up or are on Rails < 7.1, use a distinct
# path such as /healthz to avoid silently replacing existing health logic.
get "up", to: "health#show"

get "up", to: "health#show"
```

`app/controllers/health_controller.rb`:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a comment explaining why this inherits from ActionController::Base instead of ApplicationController. Without it, readers will likely "fix" this to ApplicationController, which would pull in authentication middleware and break the health endpoint.

Suggested change
# Inherits from ActionController::Base (not ApplicationController) to bypass
# authentication and other application middleware — health endpoints must be
# reachable before and independent of the app's auth layer.
class HealthController < ActionController::Base

```ruby
# Ruby stdlib; loaded explicitly for the URI/Socket readiness check.
require "socket"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing require "socket" directly in a Rails controller file is unconventional. Ruby's socket stdlib is universally available but Rails convention is to require stdlib files in an initializer (e.g., config/initializers/socket.rb).

Consider either:

  1. Moving the require to an initializer and noting that in the comment, or
  2. Adding a brief inline comment like # Ruby stdlib; available everywhere but loaded explicitly for clarity. so readers don't copy the pattern and wonder why their style guide flags it.

require "uri"

# Inherits from ActionController::Base (not ApplicationController) to avoid
# app-level authentication callbacks on unauthenticated probe requests.
class HealthController < ActionController::Base
def show
# Opens and immediately closes; raises if the renderer port is unreachable.
# A successful TCP connection means the h2c listener is bound, not that
# cluster workers are ready. Pair with the startup probe to shield liveness.
# In this same-container topology, Rails and the renderer share a network namespace.
# Probe localhost even if other deployment shapes use a service host.
# connect_timeout is supported by the Ruby versions in this guide's prerequisites.
renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Pro-only class in OSS docs. ReactOnRailsPro.configuration is only available when the Pro gem is loaded. Users following these OSS docs will hit NameError: uninitialized constant ReactOnRailsPro.

Suggestion — use an env var or hardcoded default instead:

Suggested change
renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port
renderer_port = URI.parse(ENV.fetch('RENDERER_URL', 'http://localhost:3800')).port

If this entire section is intentionally Pro-only, add a callout at the top of the "Same Rails Container" subsection noting that ReactOnRailsPro must be configured.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent wrong-port bug when renderer_url lacks an explicit port.

URI.parse("http://localhost").port returns 80 (the URI default for HTTP), not nil. If a user configures renderer_url as http://localhost (no :3800), the TCP check silently probes port 80 instead of the renderer — returning 200 if anything listens on port 80, 503 if nothing does. Neither URI::Error nor TypeError is raised, so the rescue doesn't catch this case.

Consider either validating the port or falling back explicitly:

Suggested change
renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port
renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port
raise URI::InvalidURIError, "renderer_url must include an explicit port (got: #{ReactOnRailsPro.configuration.renderer_url})" if renderer_port == URI::DEFAULT_PARSER.send(:default_port, "http")

Or more pragmatically:

renderer_uri  = URI.parse(ReactOnRailsPro.configuration.renderer_url)
renderer_port = renderer_uri.port if renderer_uri.to_s.match?(/:\d+/)
renderer_port ||= 3800  # fall back to renderer default

At minimum, a prose note alongside this snippet warning that renderer_url must include the port would prevent silent misconfiguration.

Socket.tcp("localhost", renderer_port, connect_timeout: 1) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_timeout: as a keyword argument to Socket.tcp was added in Ruby 2.7. Given Rails 7.1+ requires Ruby 2.7+ this is safe, but a brief comment acknowledging the minimum Ruby version would prevent confusion for anyone grepping this snippet in isolation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TCP target is hardcoded to "localhost" rather than extracted from renderer_url. For the "Same Rails Container" scenario this is always correct (both processes share a network namespace), but a reader who copy-pastes this controller into a different topology (e.g. after having set renderer_url to a Docker Compose service name like http://renderer:3800) would silently probe the wrong host — they'd get :ok when the local port happens to be listening for something else.

A short inline comment would prevent this confusion:

Suggested change
Socket.tcp("localhost", renderer_port, connect_timeout: 1) {}
Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} # Same container: localhost is always correct here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded "localhost" here is intentional for the same-container topology, but this is subtle and easy to miss for readers adapting the snippet to other shapes. The renderer_url is used only for the port, while the host is silently overridden.

Worth adding a brief inline callout (or a blockquote below the snippet) like:

Topology-specific: This code always probes localhost regardless of renderer_url's host — it only borrows the port. This is correct for same-container deployments where Rails and the renderer share a network namespace. Do not reuse this snippet as-is for sidecar or separate-workload topologies where renderer_url points to a different host.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty block {} is the correct Ruby idiom — Socket.tcp with a block opens the connection, yields the socket, and closes it on block exit. Without the block it returns an open Socket that is never closed. The existing comment above the method body says "Opens and immediately closes" but it's several lines away from the call. Moving a short inline comment would help readers scanning the code:

Suggested change
Socket.tcp("localhost", renderer_port, connect_timeout: 1) {}
Socket.tcp("localhost", renderer_port, connect_timeout: 1) {} # open + immediately close

head :ok
rescue SocketError, SystemCallError, URI::Error, TypeError
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rescue clause misses NoMethodError. If ReactOnRailsPro.configuration.renderer_url returns nil, the call to renderer_url.match?(...) on the line above raises NoMethodError — a subclass of NameError, not of SocketError, SystemCallError, URI::Error, or TypeError. The endpoint would respond with 500 instead of the intended 503.

Add a nil guard before the URI parse, or widen the rescue:

Suggested change
rescue SocketError, SystemCallError, URI::Error, TypeError
rescue SocketError, SystemCallError, URI::Error, TypeError, ArgumentError

Or, more defensively, guard the nil case explicitly:

renderer_url = ReactOnRailsPro.configuration.renderer_url
raise ArgumentError, "renderer_url not configured" if renderer_url.nil?

Either approach prevents an unhandled exception from masking a misconfiguration as a 500.

head :service_unavailable
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Socket.tcp(...) {} empty-block idiom is valid Ruby (the block causes the socket to be closed when it exits), and the inline comment explains it well. One small addition worth making: this TCP check confirms the renderer has bound to the port, but cluster workers may still be warming up. The existing probe docs in js-configuration.md cover this distinction, but a one-liner here would save readers from a context switch:

Suggested change
end
# Opens and immediately closes; raises if the renderer port is unreachable.
# Note: a successful TCP connection means the h2c listener is bound, not that
# cluster workers are ready. Pair with the startup probe to shield liveness.
# connect_timeout is supported by the Ruby versions in this guide's prerequisites.
renderer_port = URI.parse(ReactOnRailsPro.configuration.renderer_url).port
Socket.tcp("localhost", renderer_port, connect_timeout: 1) {}

```

### Separate Container In The Same Workload

Keep the Rails `renderer_url` as `http://localhost:3800`. Use `0.0.0.0` for the renderer `host` when you rely on
`tcpSocket` probes; `localhost` is fine for `exec`-only probes.

Add h2c-aware `exec` probes against `localhost:3800` or `tcpSocket` probes on the renderer port. For `tcpSocket`, bind the
renderer to `0.0.0.0` because Kubernetes and platform TCP probes originate from outside the container and connect to the
pod or workload IP, not container-local loopback. `exec` probes run a command inside the container, so `localhost` works
there.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (and "Separate Node-Renderer Workload" below) gives correct prose guidance but no YAML example, whereas the Kubernetes sidecar section earlier in the file provides a full manifest. A brief callout that the same exec/tcpSocket YAML patterns from the Kubernetes section apply here — just targeting the Control Plane workload container instead of a Kubernetes pod spec — would round out the guidance for Control Plane users without duplicating the full YAML.


> **Probe YAML:** For Control Plane readiness and liveness fields, reuse the individual `exec` or `tcpSocket` probe blocks
> from [Kubernetes Sidecar Manifest](#kubernetes-sidecar-manifest). Attach them to the node-renderer container in this
> workload instead of to a separate Kubernetes pod spec.

### Separate Node-Renderer Workload

Set the Rails `renderer_url` to `http://<WORKLOAD_NAME>.<GVC_NAME>.cpln.local:3800`, use `0.0.0.0` for the renderer
`host`, and add `tcpSocket` or h2c-aware `exec` probes to the node-renderer workload container. Expose the renderer port
internally, not publicly, unless required.

Use the same Control Plane probe fields as the same-workload case, but attach them to the separate node-renderer workload
container.

Replace `<WORKLOAD_NAME>` with the renderer workload name and `<GVC_NAME>` with your Control Plane Global Virtual Cloud
name. Use your actual renderer port if it is not `3800`; see Control Plane's
[service-to-service endpoint format](https://docs.controlplane.com/guides/service-to-service).

## Dockerfile Example

> **Why the renderer entry point lives in a dedicated `renderer/` directory:** Production Docker builds commonly strip JavaScript sources after the client bundles are built, since the Rails app no longer needs them at runtime. Keeping the renderer entry point in its own top-level directory (separate from `client/`) makes it trivial to exclude from that cleanup — the Node Renderer process still needs its entry file and dependencies at runtime.
Expand Down Expand Up @@ -208,14 +293,18 @@ services:
RENDERER_HOST: '0.0.0.0'
NODE_OPTIONS: '--max-old-space-size=512'
healthcheck:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references Kubernetes probe values to explain the Docker Compose value, which creates a cross-context dependency. Readers of this file in isolation have to mentally connect the two systems. Consider making the comment self-contained:

Suggested change
healthcheck:
# Set --max-time roughly 1 s below the orchestrator's probe timeout so curl exits
# cleanly with a non-zero code rather than being killed mid-request (2 s here with timeout: 3s).

test: ['CMD', 'curl', '-sf', '--http2-prior-knowledge', 'http://localhost:3800/info']
# --max-time 2 leaves a 1 s buffer below the 3 s orchestrator timeout so curl exits
# cleanly with a non-zero code rather than being killed mid-request.
test: ['CMD', 'curl', '-sf', '--max-time', '2', '--http2-prior-knowledge', 'http://localhost:3800/info']
Comment thread
justin808 marked this conversation as resolved.
interval: 5s
timeout: 3s
retries: 5
start_period: 10s
```

> **Note:** In Docker Compose, the containers do not share a network namespace (unlike Kubernetes sidecars), so the renderer must bind to `0.0.0.0` and Rails must connect via the service name (`renderer`).
> The Compose example uses `--max-time 2` with `timeout: 3s` for fast local feedback; the Kubernetes examples use
> `--max-time 3` with `timeoutSeconds: 5` to allow more scheduler and node-load jitter.

## Host Binding for Container Environments

Expand Down Expand Up @@ -388,7 +477,20 @@ During container startup, you may see `ERR_STREAM_PREMATURE_CLOSE` errors from F

**Mitigation:**

1. **Health check endpoint** — The Node Renderer exposes a built-in `/info` endpoint that returns the node version and renderer version. Because the renderer uses cleartext HTTP/2, Kubernetes `httpGet` probes (HTTP/1.1) are incompatible with this listener. Use a TCP probe, an `exec` probe (for example with `curl --http2-prior-knowledge`, which requires curl with HTTP/2 support in your container image), or a dedicated HTTP/1.1 sidecar/port for probes. For a custom `/health` route with more granular checks, use the `configureFastify()` option (see [JS Configuration: Custom Fastify Configuration](./js-configuration.md#custom-fastify-configuration)). Configure your container orchestrator to wait for it before routing traffic.
> [!NOTE]
> **Probe command notes:** `exec` probes require curl with HTTP/2 support in your image. Verify with
> `curl --version | grep -i http2`; if unavailable, use `tcpSocket` as a fallback. Set curl `--max-time` shorter than the
> orchestrator timeout so curl returns a clean non-zero exit code before Kubernetes terminates the probe process. These
> examples use `--max-time 3` with `timeoutSeconds: 5`, leaving a 2-second buffer. Readiness and liveness omit
> `initialDelaySeconds` because Kubernetes 1.20+ (startup probe GA) defers
> them until the startup probe succeeds. If you skip the startup probe or run an older cluster without startup probe
> support, add an appropriate `initialDelaySeconds`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callout uses a plain > blockquote, but the manifest section just below uses > [!WARNING]. GitHub renders > [!NOTE] as a styled callout box — using it here would give this important pre-flight checklist the same visual weight as the warning and make it stand out in rendered docs.

Suggested change
> support, add an appropriate `initialDelaySeconds`.
> [!NOTE]
> **Probe command notes:** `exec` probes require curl with HTTP/2 support in your image. Verify with
> `curl --version | grep -i http2`; if unavailable, use `tcpSocket` as a fallback. Set curl `--max-time` shorter than the
> orchestrator timeout so curl returns a clean non-zero exit code before Kubernetes terminates the probe process. These
> examples use `--max-time 3` with `timeoutSeconds: 5`, leaving a 2-second buffer. Readiness and liveness omit
> `initialDelaySeconds` because Kubernetes 1.20+ (startup probe GA) defers
> them until the startup probe succeeds. If you skip the startup probe or run an older cluster without startup probe
> support, add an appropriate `initialDelaySeconds`.


> **Security:** `/info` is unauthenticated even when `password` is configured. Keep the renderer on `localhost` or
> private networking if exposing node and renderer version details is a concern; see
> [Built-in Endpoints](./js-configuration.md#built-in-endpoints).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security note is buried inside the "Probe command notes" callout alongside the curl/HTTP2 and --max-time guidance. A reader skimming for probe YAML will likely parse this as "probe mechanics" and miss the security implication.

Consider separating it into its own > **Security:** callout block directly after the numbered list (or immediately before the /info-referencing readiness probe example), where it's harder to overlook:

> **Security:** `/info` is unauthenticated even when `password` is configured. Keep the renderer
> on `localhost` or private networking if exposing node/renderer version details is a concern;
> see [Built-in Endpoints](./js-configuration.md#built-in-endpoints).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing the > [!NOTE] callout and the security blockquote between **Mitigation:** and the numbered list visually disconnects the list from its heading in some Markdown renderers (and in raw-text reading). The numbered items look like they follow the Note's content rather than the "Mitigation" section.

Consider moving these two callouts after item 4 (or at the end of the section) so the numbered list flows directly under "Mitigation:". This is non-blocking but would improve scannability for operators copy-pasting the probe YAML.


1. **Health check endpoint** — The Node Renderer exposes a built-in `/info` endpoint that returns the node version and renderer version. Because the renderer uses cleartext HTTP/2, Kubernetes `httpGet` probes (HTTP/1.1) are incompatible with this listener. Use a TCP probe, an `exec` probe with an h2c-aware client such as `curl --http2-prior-knowledge`, or a dedicated HTTP/1.1 sidecar/port for probes. For a custom `/health` route with more granular checks, use the `configureFastify()` option (see [JS Configuration: Adding a Health Check Endpoint](./js-configuration.md#adding-a-health-check-endpoint)). Configure your container orchestrator to wait for it before routing traffic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /info endpoint bypasses password authentication — it remains reachable without the renderer password even when password is configured. This is documented in js-configuration.md's Built-in Endpoints section, but users who only read this page won't see it.

Suggest adding a brief security note here, e.g.:

Security: /info is unauthenticated even when password is configured. Keep the renderer on localhost or private networking if exposing node/renderer version details is a concern; see Built-in Endpoints.

2. **Startup probe** — Configure a startup probe with a generous `initialDelaySeconds`:
```yaml
startupProbe:
Expand All @@ -397,30 +499,79 @@ During container startup, you may see `ERR_STREAM_PREMATURE_CLOSE` errors from F
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 6
timeoutSeconds: 1
```
3. **Readiness probe** — Ensure traffic is only routed to the renderer when it's ready to accept requests. Prefer an `exec` probe with an h2c-aware client for application-level readiness. Use `tcpSocket` only as a minimal fallback that confirms the port is accepting connections:

```yaml
readinessProbe:
exec:
command:
- curl
- -sf
- --max-time
- '3'
- --http2-prior-knowledge
- http://localhost:3800/info
Comment thread
justin808 marked this conversation as resolved.
Comment thread
justin808 marked this conversation as resolved.
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 3
```
> **Note:** The `exec` probe requires curl with HTTP/2 support in your image. Verify with `curl --version | grep HTTP2`. If curl is unavailable, use `tcpSocket` as a fallback.
4. **Liveness probe** — Ensure the renderer is restarted if it becomes unresponsive:

> **Notes:**
>
> - The YAML uses `/info` so it works before custom Fastify routes exist. Replace `/info` with `/health` after
> registering that route via `configureFastify` if readiness should wait for renderer-specific warm-up checks.
> - Before upgrading an existing readiness probe, keep curl's `--max-time` lower than `timeoutSeconds`. If switching
> from `tcpSocket` to `exec`, verify curl HTTP/2 support in the image first.
> - See the probe command notes above for curl HTTP/2 support, `--max-time`, loaded-node buffers, and
> `initialDelaySeconds` guidance.

> **Readiness fallback option:** If curl lacks HTTP/2 support in your image, replace that `readinessProbe` with this
> `tcpSocket` block. This checks port reachability, not application-level readiness:
>
> ```yaml
> readinessProbe:
> tcpSocket:
> port: 3800
> # TCP handshakes should complete quickly; exec/H2 uses timeoutSeconds: 5.
> timeoutSeconds: 1
> periodSeconds: 5
> failureThreshold: 3
> ```

4. **Liveness probe** — Ensure the renderer is restarted if it becomes unresponsive. The probe below changes the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upgrade warning ("verify curl HTTP/2 support first") is important but easy to miss when reading quickly. This sentence is in the surrounding prose, but the YAML immediately follows with the exec probe already in place. Consider adding a blockquote admonition directly above the YAML block so it's visible to readers who skim to the code:

> **Before upgrading:** Run `curl --version | grep -i http2` inside your container image. If HTTP/2 support is absent, use the `tcpSocket` fallback shown below instead of this `exec` block.

This mirrors the pattern already used in the Kubernetes Sidecar Manifest section (line ~568) and makes the prerequisite impossible to miss.

liveness check from `tcpSocket` to `exec`; if you are upgrading an existing deployment, verify curl HTTP/2 support
first:

> **Before upgrading:** Run `curl --version | grep -i http2` inside your container image. If HTTP/2 support is absent,
> use the `tcpSocket` fallback shown below instead of this `exec` block.

```yaml
Comment thread
justin808 marked this conversation as resolved.
livenessProbe:
tcpSocket:
port: 3800
# Omit initialDelaySeconds only if the startupProbe above is configured.
# Requires curl with HTTP/2 support (verify: curl --version | grep -i http2).
# If unavailable, replace this exec probe with a tcpSocket probe on port 3800.
exec:
command:
- curl
- -sf
- --max-time
- '3'
- --http2-prior-knowledge
- http://localhost:3800/info
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3
```

> **Notes:**
>
> - Use `/info` by default. Only substitute `/health` for liveness if that route avoids external dependency checks and
> readiness gates.
> - See the probe command notes above for curl HTTP/2 support, `--max-time`, loaded-node buffers, and
> `initialDelaySeconds` guidance.

### OOM Tracking

Distinguish between Rails and Node Renderer OOM kills by checking container-level exit codes:
Expand Down Expand Up @@ -451,6 +602,26 @@ In production, `logLevel: 'warn'` is sufficient unless actively debugging.

A complete pod spec for the sidecar pattern:

> [!WARNING]
> The `exec` liveness probe in this copy-paste manifest requires curl with HTTP/2 support. Run
> `curl --version | grep -i http2` in your container image before replacing an existing `tcpSocket` probe. If curl lacks
> HTTP/2 support, keep `tcpSocket` or add HTTP/2-capable curl support. If you cannot verify curl before rollout, use the
> `tcpSocket` fallback block below and upgrade to `exec` later.

> **Liveness fallback option:** The manifest below uses `exec` (preferred). If curl lacks HTTP/2 support in your image,
> replace the manifest's `livenessProbe` with this `tcpSocket` block before applying it:
>
> ```yaml
> livenessProbe:
> # Omit initialDelaySeconds only if the startupProbe above is configured.
> tcpSocket:
> port: 3800
> # TCP handshakes should complete quickly; exec/H2 uses timeoutSeconds: 5.
> timeoutSeconds: 1
> periodSeconds: 10
> failureThreshold: 3
> ```

```yaml
apiVersion: apps/v1
kind: Deployment
Expand Down Expand Up @@ -513,23 +684,42 @@ spec:
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 6
timeoutSeconds: 1
readinessProbe:
# Omit initialDelaySeconds only if the startupProbe above is configured.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing "Omit … only if" reads as a restriction (when it is safe to leave it out) rather than a trigger (when it is required). Readers who skip the startup probe may not notice they need to add initialDelaySeconds. Suggested rewording:

Suggested change
# Omit initialDelaySeconds only if the startupProbe above is configured.
# Add initialDelaySeconds here if no startupProbe is configured.
# Kubernetes 1.20+ defers readiness/liveness until the startup probe succeeds.

exec:
command:
- curl
- -sf
- --max-time
- '3'
- --http2-prior-knowledge
- http://localhost:3800/info
Comment thread
justin808 marked this conversation as resolved.
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness probe block below (line 660) has a # Omit initialDelaySeconds only if the startupProbe above is configured. comment, but the readiness probe block here does not. A user who skips the startup probe and scans the YAML top-to-bottom will know to add initialDelaySeconds to liveness but not to readiness. Consider adding the same comment to this block:

Suggested change
failureThreshold: 3
failureThreshold: 3
# Omit initialDelaySeconds only if the startupProbe above is configured.

livenessProbe:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches the liveness probe from a passive tcpSocket check to an active HTTP/2 request — a meaningful behavioral upgrade, but also a migration hazard: many base images (Debian slim, Alpine) ship curl without HTTP/2 (nghttp2) support compiled in.

Consider adding a one-line callout here (before the YAML block) like:

Upgrading from an existing tcpSocket liveness probe? Run curl --version | grep -i http2 in your container to verify HTTP/2 support before switching. If it's absent, keep the tcpSocket probe or add nghttp2 to your image build.

The in-YAML comment is easy to miss for users who skim to the code block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness probe switches from tcpSocket to exec/curl here without an inline warning at the YAML level. Users who copy the full manifest directly — skipping the step-by-step section above where the [!WARNING] block lives — will silently break their liveness probe if their image doesn't ship curl with HTTP/2 support.

Suggested addition directly above livenessProbe::

Suggested change
livenessProbe:
# WARNING: exec probe requires curl with HTTP/2 support in this image.
# Verify: curl --version | grep -i http2
# If unavailable, replace exec/command below with: tcpSocket: { port: 3800 }
livenessProbe:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users who copy-paste from a previous version of this manifest won't know this changed from tcpSocket. Consider adding a comment that makes the upgrade explicit:

Suggested change
livenessProbe:
livenessProbe:
# CHANGED from tcpSocket to exec. If upgrading an existing deployment,
# verify curl --http2-prior-knowledge works in your image before switching.

tcpSocket:
port: 3800
# UPGRADE WARNING: verify curl HTTP/2 support before replacing an existing tcpSocket probe.
# Requires curl with HTTP/2 support (verify: curl --version | grep -i http2).
# If unavailable, replace this exec probe with a tcpSocket probe on port 3800.
# Omit initialDelaySeconds only if the startupProbe above is configured.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same phrasing issue as the readiness probe comment above — "Omit … only if" reads as a restriction rather than a trigger. Suggest the same rewording:

Suggested change
# Omit initialDelaySeconds only if the startupProbe above is configured.
# Add initialDelaySeconds here if no startupProbe is configured.
# Kubernetes 1.20+ defers readiness/liveness until the startup probe succeeds.

exec:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest now ships exec as the default liveness probe, with tcpSocket demoted to a blockquote fallback below the YAML. That's the reverse of the previous default.

A user who kubectl applys this manifest without reading the surrounding prose will immediately get an exec liveness probe — and if their image's curl lacks HTTP/2 support, every container will fail its liveness check and restart in a loop.

Consider keeping tcpSocket as the main-manifest default (universally safe) and elevating exec to the "upgrade" block. The upgrade path is already well-documented; it just doesn't need to be the default for copy-paste safety.

If the current order is intentional (i.e., new deployments should prefer exec), the warning comments are correct but could be a named > [!WARNING] GitHub admonition rather than plain YAML comments, since those are invisible in the running Kubernetes cluster.

command:
- curl
- -sf
- --max-time
- '3'
- --http2-prior-knowledge
- http://localhost:3800/info
timeoutSeconds: 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1-second margin between --max-time 4 and timeoutSeconds: 5 is tight on loaded nodes. Kernel scheduling jitter can put the curl process slightly over --max-time after Kubernetes starts its own timer, leading to occasional false-positive liveness kills.

Consider --max-time 3 (2-second margin) for a more robust default, or at least add a note: "On heavily loaded nodes, increase the margin (e.g., --max-time 3) if you see occasional unexpected restarts."

Suggested change
timeoutSeconds: 5
timeoutSeconds: 5

(No change needed here — just flagging the --max-time value two lines above.)

periodSeconds: 10
failureThreshold: 3
```
Comment on lines 714 to 723
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness probe was silently changed from a zero-dependency tcpSocket to an exec probe that requires curl compiled with HTTP/2 support. Users who copy this manifest into images that don't include curl+HTTP/2 will get immediate probe failures with no obvious error message.

Consider adding a YAML comment directly in the manifest so the requirement is visible at copy time:

Suggested change
livenessProbe:
tcpSocket:
port: 3800
exec:
command:
- curl
- -sf
- --max-time
- '4'
- --http2-prior-knowledge
- http://localhost:3800/info
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3
```
livenessProbe:
# Requires curl with HTTP/2 support (verify: curl --version | grep -i http2).
# If unavailable, replace exec with: tcpSocket: { port: 3800 }
exec:
command:
- curl
- -sf
- --max-time
- '4'
- --http2-prior-knowledge
- http://localhost:3800/info
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change for copy-paste users — suggest a commented-out fallback in the YAML.

The manifest now shows exec as the liveness probe, with a prose warning before the block and a YAML comment saying "If unavailable, replace exec below with a tcpSocket probe on port 3800." That's helpful, but a user who doesn't have curl --http2-prior-knowledge available will get stuck looking up the equivalent tcpSocket YAML themselves.

Consider adding the fallback as commented-out lines directly in the spec, e.g.:

          livenessProbe:
            # Omit initialDelaySeconds only if the startupProbe above is configured.
            # Requires curl with HTTP/2 support (verify: curl --version | grep -i http2).
            # If unavailable, uncomment the tcpSocket block below instead:
            # tcpSocket:
            #   port: 3800
            exec:
              command:
                - curl
                - -sf
                - --max-time
                - '4'
                - --http2-prior-knowledge
                - http://localhost:3800/info
            timeoutSeconds: 5
            periodSeconds: 10
            failureThreshold: 3

This keeps the "safe default" (exec) as the active config while making the fallback a one-line uncomment rather than a from-memory rewrite, significantly reducing the risk of misapplication.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tcpSocket fallback block appears in a callout after this YAML, which means users who copy just the YAML block (common behaviour) will deploy an exec liveness probe that silently fails if their image lacks HTTP/2-capable curl. Consider embedding the fallback as commented-out YAML directly below the livenessProbe block, e.g.:

          # --- tcpSocket fallback (if curl lacks HTTP/2 support) ---
          # livenessProbe:
          #   tcpSocket:
          #     port: 3800
          #   timeoutSeconds: 1
          #   periodSeconds: 10
          #   failureThreshold: 3

This keeps both options visible in the one artifact users are most likely to copy.


> **Readiness endpoint:** The manifest uses `/info` for copy-paste safety because that endpoint is built in. Replace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Readiness endpoint" callout is formatted as a > continuation of the "Note: Both containers use the same Docker image…" note immediately below it, making them render as a single blockquote paragraph. They're separate points and should be split into separate blockquotes with a blank line between them, or the "Readiness endpoint" text should be moved before the "Note" line.

> `/info` with `/health` in the readiness probe after registering that route via `configureFastify` if readiness should
> wait for renderer-specific warm-up checks.

> **Note:** Both containers use the same Docker image, ensuring the React on Rails gem and Node Renderer package versions are always aligned.

## Troubleshooting
Expand Down
Loading
Loading