Skip to content

Commit 18b5a6d

Browse files
fix(deploy): rewrite workspace:* to file: refs in Dockerfile + record Lesson #55
The npm install in the container failed with EUNSUPPORTEDPROTOCOL on the previous deploy attempt — npm 10.8.2 (and 11.x) reject pnpm's `workspace:*` protocol even with the workspaces field present in root package.json. Switching source files to `*` would break pnpm locally (it tries to fetch @pushflip/client from the npm registry), so source stays pnpm-native and the rewrite happens at Dockerfile build time only. What changed: - Both Dockerfiles add a `node -e ...` step right after copying workspace package.json files. The script walks app/, faucet/, and scripts/ package.json (the three workspaces with cross-workspace deps), maps known workspace names (@pushflip/client, @pushflip/dealer) to relative paths, and rewrites every `"workspace:*"` to `"file:../<path>"`. npm understands file: refs to sibling directories as symlinks. - Rewrite happens *before* npm install, so npm sees a tree of npm-compatible specifiers and resolves cleanly. Subsequent COPYs of source bring the actual workspace files into place; the install symlinks become live. Lesson #55 records: - The full 4-step migration to npm (workspaces field, source-stays- workspace:*, file: rewrite at build time, npm scripts in CMD). - The "switching source to *" dead-end (pnpm fetches from registry). - Why npm 11 on tucker doesn't help (npm runs inside the container). - Meta-lesson: when 2+ workarounds have stacked and the bug recurs, switch tools rather than adding a third patch.
1 parent 98f8733 commit 18b5a6d

3 files changed

Lines changed: 56 additions & 0 deletions

File tree

app/Dockerfile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,33 @@ COPY dealer/package.json ./dealer/
4747
COPY house-ai/package.json ./house-ai/
4848
COPY zk-circuits/package.json ./zk-circuits/
4949

50+
# Rewrite pnpm's `workspace:*` protocol into npm-readable relative
51+
# `file:` paths. pnpm-only syntax breaks `npm install` with
52+
# "Unsupported URL Type" (npm 10 + 11 both reject it). We can't just
53+
# use `"*"` because pnpm interprets that as "fetch from registry"
54+
# locally (kills `pnpm install` at dev time). Keep `workspace:*` in
55+
# source for pnpm dev; rewrite per-Dockerfile for the npm container
56+
# build. The map matches every cross-workspace dep in this monorepo;
57+
# adding a new one means extending this list.
58+
RUN node -e "\
59+
const fs = require('fs'); \
60+
const map = { \
61+
'@pushflip/client': '../clients/js', \
62+
'@pushflip/dealer': '../dealer' \
63+
}; \
64+
for (const pkgPath of ['app/package.json', 'faucet/package.json', 'scripts/package.json']) { \
65+
const pkg = JSON.parse(fs.readFileSync(pkgPath, 'utf-8')); \
66+
for (const section of ['dependencies', 'devDependencies']) { \
67+
if (!pkg[section]) continue; \
68+
for (const [name, spec] of Object.entries(pkg[section])) { \
69+
if (spec === 'workspace:*' && map[name]) { \
70+
pkg[section][name] = 'file:' + map[name]; \
71+
} \
72+
} \
73+
} \
74+
fs.writeFileSync(pkgPath, JSON.stringify(pkg, null, 2)); \
75+
}"
76+
5077
# --ignore-scripts: skip biome/ultracite postinstalls that aren't
5178
# needed for the production bundle build. Vite + tsc come via npm's
5279
# optionalDependencies (esbuild, lightningcss, rollup-linux-*) which

docs/EXECUTION_PLAN.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2078,6 +2078,8 @@ day 1 if we built this again.
20782078

20792079
54. **First public deploy (2026-04-26/27) — pnpm@10 + libuv worker threads deadlock containerized Node installs at ~733/734 packages, regardless of base image.** Shipping the Pre-Mainnet 5.0.7 faucet to https://play.pushflip.xyz/ on tucker (Panmoni's production VPS, podman + systemd quadlets) burned ~6 hours instead of the planned ~4–5, almost entirely on container build pathology. The single highest-leverage fix was pinning `pnpm@9` in both Dockerfiles (`e6ce2d4`). Going down the diagnostic tree in order: (1) the install hung first at biome's postinstall (fixed with `--ignore-scripts` since we don't need biome inside the deploy image), (2) it then hung at 733/734 with a clear futex pattern in `strace`, (3) we initially blamed musl and switched `node:20-alpine` → `node:20-slim` — same hang, ruling out libc, (4) downgrading `corepack prepare pnpm@10 --activate` → `pnpm@9 --activate` cleared the deadlock cleanly, ~9.5 s install. **What I should have remembered**: yapbay-vite, on this same host, hit this exact pathology months ago and resolved it by switching to `npm ci`. The institutional memory wasn't in front of me at deploy time. Two operational discoveries also shipped in the same script: (a) `systemctl --user restart` on a quadlet container with `Pod=` directive triggers `ExecStopPost=podman pod rm` on the pod, fully removing it; the containers' `BindsTo=pushflip-pod.service` then fails on start. Fix: include `pushflip-pod` in the script's `SERVICES` restart array so all three are in one dependency-resolved restart group. (b) `systemctl is-active` returns the moment the container starts, but the in-container process needs another 1–3 s to bind its listening port. Without a retry, the smoke check sees nginx upstream-502 even though the deploy is fine. Fix: retry-with-backoff (2s/4s/6s/8s, max 5 attempts). The same race re-bit the rollback-command printer until that loop also got embedded. **Meta-lesson, the operational equivalent of "run the exploit" from #53**: the deploy script should self-verify, not trust subprocess return codes. A "dry-run twice + watch the rollback work end-to-end" discipline before declaring redeploy ergonomics done caught the rollback-race that would otherwise have first manifested during a real outage. Full commit-by-commit fix history in [docs/DEPLOYMENT_PLAN.md](DEPLOYMENT_PLAN.md).
20802080

2081+
55. **Pre-Mainnet 5.0.10 redeploy (2026-04-27) — the `pnpm@9 + node:20-slim + --ignore-scripts` workaround stack from Lesson #54 collapsed the moment the lockfile changed. Switching to `npm install` (the actual yapbay escape hatch) was a 4-step migration, not a 1-line fix.** Adding `better-sqlite3@^11.7.0` + transitive deps (`bindings`, `node-addon-api`, `prebuild-install`, …) to the faucet workspace was the trigger; the same 7.8% CPU + zero forward progress + all-threads-`S(sleeping)` signature returned, this time hanging at "added 734 of 735" in the *vite* image build (which doesn't even depend on better-sqlite3 — pnpm's filtered install still synchronizes the workspace virtual store, which materializes the new lockfile entries even with `--filter @pushflip/app --filter @pushflip/client`). Three workarounds had stacked over Lesson #54 (pnpm@9, node:20-slim, --ignore-scripts) and the bug returned anyway, which is the strongest possible evidence that the right fix is *replacing* the pnpm-in-container path entirely, not patching it further. **The npm migration in detail (4 steps, all required):** (a) **Add a `workspaces` field to root `package.json`.** pnpm prefers `pnpm-workspace.yaml` when both exist (verified — pnpm-lock.yaml didn't churn after the addition), so local pnpm dev is unaffected; npm reads the field exclusively. Without this, npm doesn't discover the workspace structure and sibling deps fail to resolve. (b) **Don't switch source files from `workspace:*` to `*`.** Tried this first; pnpm interprets `*` as "fetch from npm registry" and the local install died with `ERR_PNPM_FETCH_404 GET https://registry.npmjs.org/@pushflip%2Fclient`. pnpm requires the explicit `workspace:` marker to prefer the local workspace over the registry. So `workspace:*` MUST stay in source for pnpm dev. (c) **Rewrite `workspace:*` → `file:../<path>` at Dockerfile build time, not in source.** A Node one-liner in the Dockerfile reads each workspace's package.json after copy, swaps `workspace:*` for npm-readable `file:` paths in dependencies + devDependencies, and writes back. npm understands `file:`-as-symlink for sibling workspace folders. Source repo stays pnpm-native; the per-Dockerfile rewrite is hermetic to the container build. (d) **Update build/CMD invocations.** `pnpm --filter @pushflip/app build` → `npm run build --workspace=@pushflip/app`; `CMD ["pnpm", "start"]` → `CMD ["npm", "start"]`. Both packages exposed `start`/`build` scripts that don't depend on the package manager binary directly, so this was mechanical. **First failed deploy attempt (98f8733) confirmed step (c) is non-negotiable**: `npm install` failed loud with `EUNSUPPORTEDPROTOCOL: Unsupported URL Type "workspace:": workspace:*` — npm 10.8.2 and 11.13.0 both reject the protocol, even when the workspace itself is present in the `workspaces` array. The `workspace:` protocol is pnpm-specific. The user's "update npm to 11 on tucker" reflex didn't help — and wouldn't have, because the npm executing the install runs *inside* the `node:20-slim` container, not on the host. **Meta-lesson on workaround-stacking**: each Lesson #54 patch (pnpm@9 pin, slim base, --ignore-scripts) addressed a *symptom* of the same underlying pnpm/libuv bug. Stacking three symptom-fixes feels like progress but creates a brittle stack — a small lockfile change collapses all three at once. The yapbay escape hatch ("just use npm") was always available; we held off because it required ~80 lines of Dockerfile change + a workspace-protocol rewrite. That cost was higher than each individual workaround, but lower than the cumulative cost we paid stacking workarounds. **Heuristic going forward**: if a workaround has been stacked twice already and the bug recurs, switch tools rather than adding a fourth patch.
2082+
20812083
---
20822084

20832085
#### Task 3.0: Devnet Smoke Test (Poseidon Stack Verification) — COMPLETED 2026-04-09

faucet/Dockerfile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,33 @@ COPY dealer/package.json ./dealer/
3939
COPY house-ai/package.json ./house-ai/
4040
COPY zk-circuits/package.json ./zk-circuits/
4141

42+
# Rewrite pnpm's `workspace:*` protocol into npm-readable relative
43+
# `file:` paths. pnpm-only syntax breaks `npm install` with
44+
# "Unsupported URL Type" (npm 10 + 11 both reject it). We can't just
45+
# use `"*"` because pnpm interprets that as "fetch from registry"
46+
# locally (kills `pnpm install` at dev time). Keep `workspace:*` in
47+
# source for pnpm dev; rewrite per-Dockerfile for the npm container
48+
# build. The map matches every cross-workspace dep in this monorepo;
49+
# adding a new one means extending this list.
50+
RUN node -e "\
51+
const fs = require('fs'); \
52+
const map = { \
53+
'@pushflip/client': '../clients/js', \
54+
'@pushflip/dealer': '../dealer' \
55+
}; \
56+
for (const pkgPath of ['app/package.json', 'faucet/package.json', 'scripts/package.json']) { \
57+
const pkg = JSON.parse(fs.readFileSync(pkgPath, 'utf-8')); \
58+
for (const section of ['dependencies', 'devDependencies']) { \
59+
if (!pkg[section]) continue; \
60+
for (const [name, spec] of Object.entries(pkg[section])) { \
61+
if (spec === 'workspace:*' && map[name]) { \
62+
pkg[section][name] = 'file:' + map[name]; \
63+
} \
64+
} \
65+
} \
66+
fs.writeFileSync(pkgPath, JSON.stringify(pkg, null, 2)); \
67+
}"
68+
4269
# --ignore-scripts: skip postinstall hooks that previously triggered
4370
# the same hang (biome/ultracite postinstalls do disk-heavy work). We
4471
# explicitly run `npm rebuild better-sqlite3` after to fetch its

0 commit comments

Comments
 (0)