Deploying To Replicate

This document describes how this repo is currently deployed to Replicate.

It covers:

staging pushes to nelsonjchen/op-replay-clipper-beta
production pushes to nelsonjchen/op-replay-clipper
local testing before a push
how stock cog 0.17.2+ fits into the current flow
how to verify a pushed version before and after promotion

Overview

This repo now deploys with a normal stock cog push, but it still relies on repo-generated Cog artifacts and a substantial bootstrap script.

The deploy path depends on:

generated Cog build artifacts from cog/render_artifacts.sh
stock cog 0.17.2+
a normal cog push

Upstream Cog 0.17.2 fixed the earlier raw-URL coercion bug for plain str inputs, so hosted Replicate can again accept normal https://connect.comma.ai/... route URLs without a custom runtime patch.

Models

The two Replicate targets are:

staging: r8.im/nelsonjchen/op-replay-clipper-beta
production: r8.im/nelsonjchen/op-replay-clipper

The intended workflow is:

push current work to the staging model
run the staging smoke matrix
if staging looks good, push the same repo state to the production model
run a smaller post-promotion smoke set on production

Prerequisites

Before deploying:

have a valid REPLICATE_API_TOKEN
have Docker working locally
have a working cog CLI installed
be on the repo state you actually want to push

Common setup:

uv sync
set -a
source .env
set +a

Manual GitHub Actions deploy

There is also a manual Actions workflow at ../.github/workflows/replicate-deploy.yml.

The main Build workflow is intentionally build-only. It validates that the hosted Cog image can be built, but it does not auto-push beta or production deployments.

It is meant for two cases:

build-only validation of the generated Cog image on a GitHub-hosted runner
manually pushing a specific generated Cog config to a Replicate model ref

Workflow notes:

it always runs ./cog/render_artifacts.sh first, because this repo does not commit the rendered Cog requirements/config as the source of truth
it accepts either cog.yaml or cog-rfdetr-repro.yaml
it expects either REPLICATE_CLI_AUTH_TOKEN or REPLICATE_API_TOKEN to be present in repository secrets for push runs
it frees extra disk space on the hosted runner first, because the main model image is large enough that the default runner layout is tight without cleanup

For the normal beta deploy path, there is also a dedicated Actions button at ../.github/workflows/replicate-deploy-beta.yml.

That workflow:

shows up in the Actions tab as Replicate Deploy Beta
always pushes cog.yaml to r8.im/nelsonjchen/op-replay-clipper-beta
reuses the same hosted deploy logic as the generic workflow

For the normal production promotion path, there is also a dedicated Actions button at ../.github/workflows/replicate-deploy-prod.yml.

That workflow:

shows up in the Actions tab as Replicate Deploy Prod
always pushes cog.yaml to r8.im/nelsonjchen/op-replay-clipper
reuses the same hosted deploy logic as the generic workflow
requires typing deploy-prod before it will run, to reduce accidental prod pushes

Local testing before a push

The cheapest and fastest validation path is still local-first:

run the local Python/uv path
optionally run local cog predict
use GCE when you want Linux/NVIDIA behavior without paying the Replicate startup tax
push to the staging Replicate model only after that looks good

Local Python path

For most behavior checks, use the repo CLI directly:

uv sync
uv run python clip.py ui 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>'

For hosted-model testing from your machine without building a local container:

uv run python replicate_run.py \
  --model 'nelsonjchen/op-replay-clipper-beta:<version>' \
  --url 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>' \
  --render-type ui \
  --output ./shared/local-hosted-smoke.mp4

Local `cog predict`

If you want to exercise the local Cog/container path:

./cog/render_artifacts.sh
cog predict -i renderType=ui -i route='https://connect.comma.ai/<dongle>/<route>/<start>/<end>'

Notes:

stock cog 0.17.2+ should accept plain connect URLs for this repo's route: str input
the local parser still accepts literal:https://... as a backwards-compatible form if you happen to have an older local helper flow lying around
local cog predict is useful for image/runtime validation, but the hosted Replicate smokes are still the source of truth before promotion

GCE testing

GCE is no longer required for routine beta deploys. Prefer the normal Cog/GitHub Actions/Replicate path for one-off beta pushes and hosted smoke tests. Use GCE when you expect multiple fix-and-retest iterations that need:

Linux behavior
NVIDIA rendering/encoding behavior
faster iteration than repeated Replicate cold starts

The typical GCE flow is:

create or start a GPU VM
sync the repo there
run the local CLI or local cog predict
copy the output artifact back
stop the VM when you are done

Host-side local CLI example on the VM:

uv sync
uv run python clip.py ui 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>'

Hosted-model smoke from the VM:

uv run python replicate_run.py \
  --model 'nelsonjchen/op-replay-clipper-beta:<version>' \
  --url 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>' \
  --render-type ui \
  --output ./shared/gce-hosted-smoke.mp4

Local Cog/container smoke on the VM:

./cog/render_artifacts.sh
cog predict --gpus all -i renderType=ui -i route='https://connect.comma.ai/<dongle>/<route>/<start>/<end>'

GCE is especially useful for checking:

null/EGL rendering on Linux
NVENC behavior
whether a new runtime or bootstrap change behaves correctly before pushing to Replicate
the passenger redaction product path via ./scripts/smoke_driver_redaction.sh --backend local --accel nvidia

For this repo's current Linux/NVIDIA validation target, use your Cowboy project GPU VM and keep the details in local environment config rather than committing them into the repo.

Add these to your local .env:

export GCE_PROJECT=your-gcp-project
export GCE_ZONE=your-gce-zone
export GCE_INSTANCE=your-gpu-vm-name

Typical start/stop flow:

set -a
source .env
set +a

gcloud compute instances start "$GCE_INSTANCE" \
  --project "$GCE_PROJECT" \
  --zone "$GCE_ZONE"

gcloud compute ssh "$GCE_INSTANCE" \
  --project "$GCE_PROJECT" \
  --zone "$GCE_ZONE"

If the zone is temporarily out of L4 capacity, use the retry helper instead of manually re-running gcloud compute instances start:

set -a
source .env
set +a

./scripts/wait_for_gce_instance_start.sh

It retries through ZONE_RESOURCE_POOL_EXHAUSTED / stockout errors every 10 minutes by default and exits as soon as the instance reaches RUNNING.

For the RF-DETR Linux/CUDA debugging track, use the dedicated T4 acquisition helper instead. It creates a temporary T4 VM, tries spot first, falls back to standard if spot capacity stays unavailable, and keeps retrying ordered zones until one succeeds:

set -a
source .env
set +a

./scripts/acquire_t4_gce_instance.sh

It writes the chosen instance name and zone into a local temp state dir so follow-up scripts can reuse the VM. When you are done, delete that temporary VM and clear the state with:

./scripts/delete_t4_gce_instance.sh

Before running Cog or RF-DETR smokes on that VM, bootstrap it into the known-good T4 state:

./scripts/bootstrap_t4_gce_vm.sh

That installs Docker, Cog, nvidia-container-toolkit, and the NVIDIA video driver packages (libnvidia-encode-580-server, libnvidia-decode-580-server) so NVENC/NVCUVID are available inside the Cog container instead of only bare CUDA compute.

Tiny RF-DETR repro path

There is also a tiny RF-DETR-only repro surface for debugging Cog/Replicate GPU issues without the rest of the clipper stack.

It uses:

scripts/rf_detr_repro.py for plain local Python
cog_rfdetr_repro_predictor.py for local/hosted Cog
rf_detr_repro_run.py for hosted Replicate smoke runs
scripts/smoke_rf_detr_repro.sh to prepare a tiny still and tiny clip from an existing local source

Local plain-Python repro:

uv sync
./scripts/smoke_rf_detr_repro.sh --backend local-cli

Local Cog repro:

./cog/render_artifacts.sh
./scripts/smoke_rf_detr_repro.sh --backend local-cog --device cuda --require-actual-device cuda

Hosted beta repro:

./cog/render_artifacts.sh
cog push --file cog-rfdetr-repro.yaml r8.im/nelsonjchen/op-replay-clipper-rfdetr-repro-beta

uv run python rf_detr_repro_run.py \
  --model 'nelsonjchen/op-replay-clipper-rfdetr-repro-beta:<version>' \
  --input ./shared/rf-detr-repro-inputs/tiny-clip.mp4 \
  --output ./shared/rf-detr-repro-hosted-artifacts.zip

Use this path when you want to answer questions like:

does RF-DETR itself succeed on GPU in plain Python?
does local cog predict keep or lose GPU access?
does hosted Replicate fail on the same tiny input?

For the current GPU-only debugging track, treat any RF-DETR CPU fallback as a failure. On Linux/NVIDIA, require actual_model_device = "cuda" in the repro report and require the driver redaction selection report to show hidden_redaction.rf_detr_device = "cuda".

Step 1: Regenerate Cog build artifacts

The push helper does this for you, but the underlying command is:

./cog/render_artifacts.sh

That regenerates:

requirements-cog.txt
cog.yaml
cog-rfdetr-repro.yaml

The rendered cog.yaml embeds the shared bootstrap script so the build stays reproducible inside Replicate/Cog.

Step 2: Push staging

The standard staging deploy is:

./cog/render_artifacts.sh
cog push r8.im/nelsonjchen/op-replay-clipper-beta

That targets:

r8.im/nelsonjchen/op-replay-clipper-beta

Step 3: Identify the new staging version

After the push, get the latest version id:

uv run python - <<'PY'
import os
import replicate

client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
model = client.models.get("nelsonjchen/op-replay-clipper-beta")
versions = list(model.versions.list())
print(versions[0].id)
PY

Use that exact version id for smoke testing rather than relying on the model alias alone.

Step 4: Run the staging smoke matrix

The current promotion gate is documented in:

prod-readiness-checklist.md

The main risky surfaces are:

UI raw URL handling
UI HEVC output
forward rendering
360 rendering on newer mici routes
360 forward-upon-wide rendering on newer mici routes
JWT-backed UI rendering

The standard route used for the current regression matrix is:

https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105

That route is useful because it exercises the newer mici camera dimensions that previously broke the 360 path.

Step 5: Push production

Once staging is good, push the same repo state to production by overriding the target model:

cog push r8.im/nelsonjchen/op-replay-clipper

Step 6: Identify the new production version

Get the newest production version id:

uv run python - <<'PY'
import os
import replicate

client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
model = client.models.get("nelsonjchen/op-replay-clipper")
versions = list(model.versions.list())
print(versions[0].id)
PY

Step 7: Run the post-promotion smoke set

After a production push, rerun at least:

UI raw URL
UI HEVC
one non-UI case, usually 360 or forward

Example:

uv run python replicate_run.py \
  --model 'nelsonjchen/op-replay-clipper:<prod-version>' \
  --url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
  --render-type ui \
  --file-format auto \
  --output ./shared/prod-live-ui-raw.mp4

uv run python replicate_run.py \
  --model 'nelsonjchen/op-replay-clipper:<prod-version>' \
  --url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
  --render-type ui \
  --file-format hevc \
  --output ./shared/prod-live-ui-hevc.mp4

uv run python replicate_run.py \
  --model 'nelsonjchen/op-replay-clipper:<prod-version>' \
  --url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
  --render-type 360 \
  --file-format auto \
  --output ./shared/prod-live-360.mp4

Verify with ffprobe after each run.

For 360 outputs, also verify spherical metadata is still present.

What "good" looks like

A good deploy currently means:

hosted Replicate accepts a normal raw https://connect.comma.ai/... route URL
UI renders work in both H.264 and HEVC
360 outputs still include spherical metadata
newer mici routes render successfully in 360 and 360 forward-upon-wide
the pushed version was built from stock cog 0.17.2+ with the current repo bootstrap

Notes and gotchas

Local cog predict should work with plain connect URLs on stock cog 0.17.2+.
The parser still accepts literal:https://... for backwards compatibility, but that is no longer the recommended path.
If a deploy behaves strangely, check the current upstream/Cog patch context in:
- upstream-modifications.md
- runtime-patching-and-ui-rendering.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying To Replicate

Overview

Models

Prerequisites

Manual GitHub Actions deploy

Local testing before a push

Local Python path

Local `cog predict`

GCE testing

Tiny RF-DETR repro path

Step 1: Regenerate Cog build artifacts

Step 2: Push staging

Step 3: Identify the new staging version

Step 4: Run the staging smoke matrix

Step 5: Push production

Step 6: Identify the new production version

Step 7: Run the post-promotion smoke set

What "good" looks like

Notes and gotchas

FilesExpand file tree

deploying-to-replicate.md

Latest commit

History

deploying-to-replicate.md

File metadata and controls

Deploying To Replicate

Overview

Models

Prerequisites

Manual GitHub Actions deploy

Local testing before a push

Local Python path

Local cog predict

GCE testing

Tiny RF-DETR repro path

Step 1: Regenerate Cog build artifacts

Step 2: Push staging

Step 3: Identify the new staging version

Step 4: Run the staging smoke matrix

Step 5: Push production

Step 6: Identify the new production version

Step 7: Run the post-promotion smoke set

What "good" looks like

Notes and gotchas

Local `cog predict`