Skip to content

Commit 98f8733

Browse files
fix(deploy): switch Dockerfiles from pnpm install to npm install (Lesson #54 escape hatch)
The pnpm install --filter ... --frozen-lockfile --ignore-scripts step deadlocks deterministically at ~734 packages on tucker (libuv worker- thread futex hang). Pinning pnpm@9 + node:20-slim + --ignore-scripts all stack the workaround layers but the bug surfaced again the moment the lockfile changed (better-sqlite3 + transitive deps added in the 5.0.10 PR). The yapbay-vite escape hatch — switch to npm — is the established fix; applying it here. What changed: - Root package.json: add `workspaces` array. npm reads this field; pnpm prefers pnpm-workspace.yaml when both exist, so local pnpm dev is unaffected (verified — pnpm-lock.yaml didn't churn after the addition). - faucet/Dockerfile: drop corepack/pnpm@9 setup; install via `npm install --no-audit --no-fund --ignore-scripts --omit=optional` at the root. Existing `npm rebuild better-sqlite3` + `node -e "require('better-sqlite3')"` smoke check stay (npm rebuild handles the prebuild fetch the same way pnpm rebuild did). CMD is now `npm start`. - app/Dockerfile: same install switch. Build invocation changes from `pnpm --filter @pushflip/app build` to `npm run build --workspace=@pushflip/app`. Both Dockerfiles now copy ALL workspace package.json files (not just the filtered ones) because npm install at the root reads every workspace's manifest. Adds a few hundred bytes to the build context per file; cache-friendly because the layer only invalidates when any package.json changes. Lockfile decision: deliberately not committing a package-lock.json. The Dockerfile uses `npm install` (not `npm ci`) so the build doesn't require one; npm resolves fresh from the package.json semver ranges. For a containerized build that's tested locally before deploy, this is acceptable; if reproducibility becomes a concern, a future commit can `npm install --package-lock-only` at the root and check it in alongside pnpm-lock.yaml. Tracked separately.
1 parent 3b4eb27 commit 98f8733

3 files changed

Lines changed: 79 additions & 47 deletions

File tree

app/Dockerfile

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -22,36 +22,45 @@
2222
# on the deploy host. Image carries NO secrets — Helius URLs include
2323
# an API key but it's a public devnet key with no economic value.
2424
#
25-
# Build base: node:20-slim (Debian, glibc), NOT node:20-alpine.
26-
# pnpm install hangs deterministically on Alpine in rootless podman
27-
# (worker-thread futex deadlock; see faucet/Dockerfile header). Serve
28-
# stage stays alpine because `serve` is pure JS with no native deps,
29-
# so the runtime image stays small.
25+
# Base: node:20-slim (Debian, glibc) for build, node:20-alpine for
26+
# the serve stage (pure JS, no native deps so musl is fine).
27+
#
28+
# Install: `npm install` (NOT pnpm). pnpm install in containerized
29+
# Node deadlocks deterministically at ~734 packages — libuv worker-
30+
# thread futex hang. Documented as Lesson #54; re-hit during the
31+
# 5.0.10 deploy with better-sqlite3 in the lockfile. yapbay-vite hit
32+
# this same bug on the same host and resolved it by switching to
33+
# npm; we do the same.
3034

3135
# --- build stage ---------------------------------------------------
3236
FROM node:20-slim AS build
3337
WORKDIR /app
34-
RUN corepack enable && corepack prepare pnpm@9 --activate
3538

36-
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
39+
# Copy workspace package.json files first so this layer caches across
40+
# source edits.
41+
COPY package.json ./
3742
COPY app/package.json ./app/
3843
COPY clients/js/package.json ./clients/js/
39-
# --ignore-scripts: workaround for pnpm install hanging on postinstall
40-
# scripts inside rootless podman on Alpine. Same issue caused yapbay's
41-
# pnpm-test image to hang for 13 days (see ~/repos/yapbay/Containerfile
42-
# which works around it by switching to npm ci entirely). Platform-
43-
# specific binaries that vite/tsc need (esbuild, lightningcss, the
44-
# rollup-linux-* native binding) are declared as optionalDependencies
45-
# and install regardless of --ignore-scripts. Biome's postinstall
46-
# (the likely culprit for the 10-min hang we saw) is dev-only and not
47-
# needed for the production bundle build.
48-
RUN pnpm install --filter @pushflip/app --filter @pushflip/client --frozen-lockfile --ignore-scripts
44+
COPY faucet/package.json ./faucet/
45+
COPY scripts/package.json ./scripts/
46+
COPY dealer/package.json ./dealer/
47+
COPY house-ai/package.json ./house-ai/
48+
COPY zk-circuits/package.json ./zk-circuits/
49+
50+
# --ignore-scripts: skip biome/ultracite postinstalls that aren't
51+
# needed for the production bundle build. Vite + tsc come via npm's
52+
# optionalDependencies (esbuild, lightningcss, rollup-linux-*) which
53+
# install regardless of --ignore-scripts.
54+
# --no-audit --no-fund: less noise + slightly faster.
55+
# --omit=optional: drop platform-specific binaries we don't need on
56+
# this Linux x64 build.
57+
RUN npm install --no-audit --no-fund --ignore-scripts --omit=optional
4958

5059
COPY clients/js ./clients/js
5160
COPY app ./app
5261

5362
# Defaults match the dev expectation but production builds MUST pass
54-
# all three via --build-arg. The deploy script does this.
63+
# all four via --build-arg. The deploy script does this.
5564
ARG VITE_FAUCET_URL=/api/faucet
5665
ARG VITE_NICKNAME_URL=/api/nickname
5766
ARG VITE_RPC_ENDPOINT=https://api.devnet.solana.com
@@ -61,7 +70,8 @@ ENV VITE_NICKNAME_URL=${VITE_NICKNAME_URL}
6170
ENV VITE_RPC_ENDPOINT=${VITE_RPC_ENDPOINT}
6271
ENV VITE_RPC_WS_ENDPOINT=${VITE_RPC_WS_ENDPOINT}
6372

64-
RUN pnpm --filter @pushflip/app build
73+
# Run the workspace's build script via npm.
74+
RUN npm run build --workspace=@pushflip/app
6575

6676
# --- serve stage ---------------------------------------------------
6777
# Serves the built dist/ on port 5175. Matches the yapbay-vite pattern

faucet/Dockerfile

Lines changed: 40 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -9,57 +9,69 @@
99
# and the keypair is bind-mounted read-only from the host filesystem.
1010
# This image carries NO secrets.
1111
#
12-
# Base: node:20-slim (Debian, glibc), NOT node:20-alpine (musl).
13-
# pnpm install hangs deterministically on Alpine inside rootless
14-
# podman due to a worker-thread/futex deadlock — see the matching
15-
# yapbay-pushflip session log + the abandoned ~/repos/yapbay/Containerfile
16-
# (renamed to .pnpm-test) that hung for 13 days on the same issue.
17-
# Debian's glibc is the minimal-change fix; image size goes from ~50MB
18-
# to ~150MB which is irrelevant on tucker (49 GB free).
12+
# Base: node:20-slim (Debian, glibc).
13+
#
14+
# Install: `npm install` (NOT pnpm). pnpm install in containerized Node
15+
# deadlocks deterministically at ~734 packages — libuv worker-thread
16+
# futex hang. Documented as Lesson #54 in EXECUTION_PLAN.md and re-hit
17+
# during the 5.0.10 deploy after better-sqlite3 was added to the
18+
# lockfile (the package count changed but the deadlock signature did
19+
# not). yapbay-vite hit this same bug months ago on this same host
20+
# and resolved it by switching to npm; we do the same.
21+
#
22+
# Workspaces: root package.json declares a `workspaces` array that npm
23+
# reads (pnpm prefers pnpm-workspace.yaml when both exist, so local
24+
# pnpm dev is unaffected by the field).
1925

2026
FROM node:20-slim AS base
2127
WORKDIR /app
22-
RUN corepack enable && corepack prepare pnpm@9 --activate
2328

24-
# Install workspace deps. Copy the lockfile + the package.json files
25-
# of the workspaces we need first, so this layer caches across source
26-
# edits. @pushflip/client is needed because faucet imports U64_MAX +
27-
# TEST_FLIP_MINT + TOKEN_PROGRAM_ID + FLIP_DECIMALS from it.
28-
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
29+
# Install workspace deps. Copy the workspace package.json files first
30+
# so this layer caches across source edits. @pushflip/client is needed
31+
# because faucet imports U64_MAX + TEST_FLIP_MINT + TOKEN_PROGRAM_ID +
32+
# FLIP_DECIMALS from it.
33+
COPY package.json ./
2934
COPY faucet/package.json ./faucet/
3035
COPY clients/js/package.json ./clients/js/
31-
# --ignore-scripts: workaround for pnpm install hanging on postinstall
32-
# scripts inside rootless podman on Alpine. Same issue caused yapbay's
33-
# pnpm-test image to hang for 13 days. Platform-specific binaries
34-
# (esbuild, lightningcss, etc.) come via optionalDependencies and
35-
# install regardless of --ignore-scripts. The faucet has no runtime
36-
# binary deps, so this is strictly safe here.
37-
RUN pnpm install --filter @pushflip/faucet --filter @pushflip/client --frozen-lockfile --ignore-scripts
36+
COPY app/package.json ./app/
37+
COPY scripts/package.json ./scripts/
38+
COPY dealer/package.json ./dealer/
39+
COPY house-ai/package.json ./house-ai/
40+
COPY zk-circuits/package.json ./zk-circuits/
41+
42+
# --ignore-scripts: skip postinstall hooks that previously triggered
43+
# the same hang (biome/ultracite postinstalls do disk-heavy work). We
44+
# explicitly run `npm rebuild better-sqlite3` after to fetch its
45+
# native binding.
46+
# --no-audit --no-fund: less noise + slightly faster.
47+
# --omit=optional: drop platform-specific optional binaries we don't
48+
# need (e.g. esbuild's macOS binaries on a Linux build).
49+
RUN npm install --no-audit --no-fund --ignore-scripts --omit=optional
3850

3951
# `--ignore-scripts` blocked better-sqlite3's prebuild-install, which
4052
# normally downloads the precompiled native binding for the running
41-
# Node version. `pnpm rebuild better-sqlite3` re-runs ONLY that
53+
# Node version. `npm rebuild better-sqlite3` re-runs ONLY that
4254
# package's install script in place, fetching the prebuild for our
4355
# (node-20, linux-x64, glibc) target. better-sqlite3 11.x ships
4456
# prebuilds for that target so the fetch path always wins; the
4557
# build-from-source fallback (which would need python3/make/g++) is
4658
# never exercised here.
4759
#
4860
# The `node -e "require('better-sqlite3')"` smoke is a load-bearing
49-
# guard, not a paranoia check: if upstream ever drops the prebuild for
50-
# our target tuple, `pnpm rebuild` silently succeeds (it tries the
51-
# from-source fallback, which fails to find `python3`/`make`/`g++` and
61+
# guard, not a paranoia check: if upstream ever drops the prebuild
62+
# for our target tuple, npm rebuild silently succeeds (it tries the
63+
# from-source fallback, which fails to find python3/make/g++ and
5264
# leaves the package in a broken state without a non-zero exit). The
5365
# require() call fails loudly with "could not locate the bindings
5466
# file", giving the operator a clear pointer to add the build deps.
5567
# (Pre-Mainnet 5.0.10 — globally-unique-nickname registry needs the
5668
# native sqlite binding at runtime.)
57-
RUN pnpm rebuild better-sqlite3 \
69+
RUN npm rebuild better-sqlite3 \
5870
&& node -e "require('better-sqlite3')" \
5971
|| (echo "[faucet] FATAL: better-sqlite3 native binding failed to load. If this happened after a node/better-sqlite3 version bump, the prebuild may be missing for the target tuple — add 'apt-get install -y python3 make g++' before this RUN step to enable the from-source fallback." && exit 1)
6072

6173
# Source. clients/js is consumed in source form by @pushflip/faucet
62-
# (no build step — pnpm workspace alias resolves to clients/js/src/index.ts).
74+
# (no build step — the workspace symlink resolves to clients/js/src/index.ts).
6375
COPY clients/js ./clients/js
6476
COPY faucet ./faucet
6577

@@ -70,4 +82,5 @@ WORKDIR /app/faucet
7082
# FAUCET_KEYPAIR_PATH points at.
7183
EXPOSE 3001
7284

73-
CMD ["pnpm", "start"]
85+
# `npm start` runs the workspace's "start" script (tsx src/index.ts).
86+
CMD ["npm", "start"]

package.json

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,14 @@
77
"build": "just build",
88
"test": "just test",
99
"lint": "just lint"
10-
}
10+
},
11+
"workspaces": [
12+
"app",
13+
"faucet",
14+
"clients/js",
15+
"scripts",
16+
"dealer",
17+
"house-ai",
18+
"zk-circuits"
19+
]
1120
}

0 commit comments

Comments
 (0)