This document describes how this repo is currently deployed to Replicate.
It covers:
- staging pushes to
nelsonjchen/op-replay-clipper-beta - production pushes to
nelsonjchen/op-replay-clipper - local testing before a push
- how stock
cog 0.17.2+fits into the current flow - how to verify a pushed version before and after promotion
This repo now deploys with a normal stock cog push, but it still relies on
repo-generated Cog artifacts and a substantial bootstrap script.
The deploy path depends on:
- generated Cog build artifacts from
cog/render_artifacts.sh - stock
cog 0.17.2+ - a normal
cog push
Upstream Cog 0.17.2 fixed the earlier raw-URL coercion bug for plain str
inputs, so hosted Replicate can again accept normal
https://connect.comma.ai/... route URLs without a custom runtime patch.
The two Replicate targets are:
- staging:
r8.im/nelsonjchen/op-replay-clipper-beta - production:
r8.im/nelsonjchen/op-replay-clipper
The intended workflow is:
- push current work to the staging model
- run the staging smoke matrix
- if staging looks good, push the same repo state to the production model
- run a smaller post-promotion smoke set on production
Before deploying:
- have a valid
REPLICATE_API_TOKEN - have Docker working locally
- have a working
cogCLI installed - be on the repo state you actually want to push
Common setup:
uv sync
set -a
source .env
set +aThere is also a manual Actions workflow at
../.github/workflows/replicate-deploy.yml.
The main Build workflow is intentionally
build-only. It validates that the hosted Cog image can be built, but it does
not auto-push beta or production deployments.
It is meant for two cases:
- build-only validation of the generated Cog image on a GitHub-hosted runner
- manually pushing a specific generated Cog config to a Replicate model ref
Workflow notes:
- it always runs
./cog/render_artifacts.shfirst, because this repo does not commit the rendered Cog requirements/config as the source of truth - it accepts either
cog.yamlorcog-rfdetr-repro.yaml - it expects either
REPLICATE_CLI_AUTH_TOKENorREPLICATE_API_TOKENto be present in repository secrets for push runs - it frees extra disk space on the hosted runner first, because the main model image is large enough that the default runner layout is tight without cleanup
For the normal beta deploy path, there is also a dedicated Actions button at
../.github/workflows/replicate-deploy-beta.yml.
That workflow:
- shows up in the Actions tab as
Replicate Deploy Beta - always pushes
cog.yamltor8.im/nelsonjchen/op-replay-clipper-beta - reuses the same hosted deploy logic as the generic workflow
For the normal production promotion path, there is also a dedicated Actions
button at
../.github/workflows/replicate-deploy-prod.yml.
That workflow:
- shows up in the Actions tab as
Replicate Deploy Prod - always pushes
cog.yamltor8.im/nelsonjchen/op-replay-clipper - reuses the same hosted deploy logic as the generic workflow
- requires typing
deploy-prodbefore it will run, to reduce accidental prod pushes
The cheapest and fastest validation path is still local-first:
- run the local Python/uv path
- optionally run local
cog predict - use GCE when you want Linux/NVIDIA behavior without paying the Replicate startup tax
- push to the staging Replicate model only after that looks good
For most behavior checks, use the repo CLI directly:
uv sync
uv run python clip.py ui 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>'For hosted-model testing from your machine without building a local container:
uv run python replicate_run.py \
--model 'nelsonjchen/op-replay-clipper-beta:<version>' \
--url 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>' \
--render-type ui \
--output ./shared/local-hosted-smoke.mp4If you want to exercise the local Cog/container path:
./cog/render_artifacts.sh
cog predict -i renderType=ui -i route='https://connect.comma.ai/<dongle>/<route>/<start>/<end>'Notes:
- stock
cog 0.17.2+should accept plain connect URLs for this repo'sroute: strinput - the local parser still accepts
literal:https://...as a backwards-compatible form if you happen to have an older local helper flow lying around - local
cog predictis useful for image/runtime validation, but the hosted Replicate smokes are still the source of truth before promotion
GCE is no longer required for routine beta deploys. Prefer the normal Cog/GitHub Actions/Replicate path for one-off beta pushes and hosted smoke tests. Use GCE when you expect multiple fix-and-retest iterations that need:
- Linux behavior
- NVIDIA rendering/encoding behavior
- faster iteration than repeated Replicate cold starts
The typical GCE flow is:
- create or start a GPU VM
- sync the repo there
- run the local CLI or local
cog predict - copy the output artifact back
- stop the VM when you are done
Host-side local CLI example on the VM:
uv sync
uv run python clip.py ui 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>'Hosted-model smoke from the VM:
uv run python replicate_run.py \
--model 'nelsonjchen/op-replay-clipper-beta:<version>' \
--url 'https://connect.comma.ai/<dongle>/<route>/<start>/<end>' \
--render-type ui \
--output ./shared/gce-hosted-smoke.mp4Local Cog/container smoke on the VM:
./cog/render_artifacts.sh
cog predict --gpus all -i renderType=ui -i route='https://connect.comma.ai/<dongle>/<route>/<start>/<end>'GCE is especially useful for checking:
- null/EGL rendering on Linux
- NVENC behavior
- whether a new runtime or bootstrap change behaves correctly before pushing to Replicate
- the passenger redaction product path via
./scripts/smoke_driver_redaction.sh --backend local --accel nvidia
For this repo's current Linux/NVIDIA validation target, use your Cowboy project GPU VM and keep the details in local environment config rather than committing them into the repo.
Add these to your local .env:
export GCE_PROJECT=your-gcp-project
export GCE_ZONE=your-gce-zone
export GCE_INSTANCE=your-gpu-vm-nameTypical start/stop flow:
set -a
source .env
set +a
gcloud compute instances start "$GCE_INSTANCE" \
--project "$GCE_PROJECT" \
--zone "$GCE_ZONE"
gcloud compute ssh "$GCE_INSTANCE" \
--project "$GCE_PROJECT" \
--zone "$GCE_ZONE"If the zone is temporarily out of L4 capacity, use the retry helper instead of
manually re-running gcloud compute instances start:
set -a
source .env
set +a
./scripts/wait_for_gce_instance_start.shIt retries through ZONE_RESOURCE_POOL_EXHAUSTED / stockout errors every 10
minutes by default and exits as soon as the instance reaches RUNNING.
For the RF-DETR Linux/CUDA debugging track, use the dedicated T4 acquisition helper instead. It creates a temporary T4 VM, tries spot first, falls back to standard if spot capacity stays unavailable, and keeps retrying ordered zones until one succeeds:
set -a
source .env
set +a
./scripts/acquire_t4_gce_instance.shIt writes the chosen instance name and zone into a local temp state dir so follow-up scripts can reuse the VM. When you are done, delete that temporary VM and clear the state with:
./scripts/delete_t4_gce_instance.shBefore running Cog or RF-DETR smokes on that VM, bootstrap it into the known-good T4 state:
./scripts/bootstrap_t4_gce_vm.shThat installs Docker, Cog, nvidia-container-toolkit, and the NVIDIA video
driver packages (libnvidia-encode-580-server,
libnvidia-decode-580-server) so NVENC/NVCUVID are available inside the Cog
container instead of only bare CUDA compute.
There is also a tiny RF-DETR-only repro surface for debugging Cog/Replicate GPU issues without the rest of the clipper stack.
It uses:
scripts/rf_detr_repro.pyfor plain local Pythoncog_rfdetr_repro_predictor.pyfor local/hosted Cogrf_detr_repro_run.pyfor hosted Replicate smoke runsscripts/smoke_rf_detr_repro.shto prepare a tiny still and tiny clip from an existing local source
Local plain-Python repro:
uv sync
./scripts/smoke_rf_detr_repro.sh --backend local-cliLocal Cog repro:
./cog/render_artifacts.sh
./scripts/smoke_rf_detr_repro.sh --backend local-cog --device cuda --require-actual-device cudaHosted beta repro:
./cog/render_artifacts.sh
cog push --file cog-rfdetr-repro.yaml r8.im/nelsonjchen/op-replay-clipper-rfdetr-repro-beta
uv run python rf_detr_repro_run.py \
--model 'nelsonjchen/op-replay-clipper-rfdetr-repro-beta:<version>' \
--input ./shared/rf-detr-repro-inputs/tiny-clip.mp4 \
--output ./shared/rf-detr-repro-hosted-artifacts.zipUse this path when you want to answer questions like:
- does RF-DETR itself succeed on GPU in plain Python?
- does local
cog predictkeep or lose GPU access? - does hosted Replicate fail on the same tiny input?
For the current GPU-only debugging track, treat any RF-DETR CPU fallback as a
failure. On Linux/NVIDIA, require actual_model_device = "cuda" in the repro
report and require the driver redaction selection report to show
hidden_redaction.rf_detr_device = "cuda".
The push helper does this for you, but the underlying command is:
./cog/render_artifacts.shThat regenerates:
requirements-cog.txtcog.yamlcog-rfdetr-repro.yaml
The rendered cog.yaml embeds the shared bootstrap script so the build stays
reproducible inside Replicate/Cog.
The standard staging deploy is:
./cog/render_artifacts.sh
cog push r8.im/nelsonjchen/op-replay-clipper-betaThat targets:
r8.im/nelsonjchen/op-replay-clipper-beta
After the push, get the latest version id:
uv run python - <<'PY'
import os
import replicate
client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
model = client.models.get("nelsonjchen/op-replay-clipper-beta")
versions = list(model.versions.list())
print(versions[0].id)
PYUse that exact version id for smoke testing rather than relying on the model alias alone.
The current promotion gate is documented in:
The main risky surfaces are:
- UI raw URL handling
- UI HEVC output
- forward rendering
- 360 rendering on newer mici routes
- 360 forward-upon-wide rendering on newer mici routes
- JWT-backed UI rendering
The standard route used for the current regression matrix is:
https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105
That route is useful because it exercises the newer mici camera dimensions that previously broke the 360 path.
Once staging is good, push the same repo state to production by overriding the target model:
cog push r8.im/nelsonjchen/op-replay-clipperGet the newest production version id:
uv run python - <<'PY'
import os
import replicate
client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
model = client.models.get("nelsonjchen/op-replay-clipper")
versions = list(model.versions.list())
print(versions[0].id)
PYAfter a production push, rerun at least:
- UI raw URL
- UI HEVC
- one non-UI case, usually
360orforward
Example:
uv run python replicate_run.py \
--model 'nelsonjchen/op-replay-clipper:<prod-version>' \
--url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
--render-type ui \
--file-format auto \
--output ./shared/prod-live-ui-raw.mp4uv run python replicate_run.py \
--model 'nelsonjchen/op-replay-clipper:<prod-version>' \
--url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
--render-type ui \
--file-format hevc \
--output ./shared/prod-live-ui-hevc.mp4uv run python replicate_run.py \
--model 'nelsonjchen/op-replay-clipper:<prod-version>' \
--url 'https://connect.comma.ai/5beb9b58bd12b691/0000010a--a51155e496/90/105' \
--render-type 360 \
--file-format auto \
--output ./shared/prod-live-360.mp4Verify with ffprobe after each run.
For 360 outputs, also verify spherical metadata is still present.
A good deploy currently means:
- hosted Replicate accepts a normal raw
https://connect.comma.ai/...route URL - UI renders work in both H.264 and HEVC
- 360 outputs still include spherical metadata
- newer mici routes render successfully in 360 and 360 forward-upon-wide
- the pushed version was built from stock
cog 0.17.2+with the current repo bootstrap
- Local
cog predictshould work with plain connect URLs on stockcog 0.17.2+. - The parser still accepts
literal:https://...for backwards compatibility, but that is no longer the recommended path. - If a deploy behaves strangely, check the current upstream/Cog patch context in: