Skip to content

Commit 598ce98

Browse files
fix(deploy): pass --network=host to podman build (netavark breaks npm sockets)
ROOT CAUSE FOUND. Tested tucker→registry.npmjs.org from the host directly: curl -sS https://registry.npmjs.org/react → 200 in 0.37s, 6.6 MB curl -sS https://registry.npmjs.org/@rolldown/binding-...tgz → 200 in 0.37s, 7.8 MB 10 parallel manifest fetches → all 200 OK Host network is FINE. The EIDLETIMEOUT failures across 8 consecutive deploy attempts were happening INSIDE the build container, not on tucker's network. Podman's rootless netavark networking mangles TCP keep-alives in a way that breaks npm's HTTP-agent connection pool — the same registry, same network, same package set, but routed through netavark, idle-times-out at ~60s with no progress. Fix: pass `--network=host` to both `podman build` invocations in scripts/deploy-tucker.sh. Build containers now use the host's network stack directly (same as how the runtime pushflip pod already runs `Network=host` per its quadlet) — bypassing netavark entirely for the install phase. Why we didn't hit this on the original 5.0.7 deploy: pnpm install's network behavior is different (different HTTP agent, different connection lifetimes). pnpm's deadlock surfaced as a libuv-worker deadlock at ~734 packages. npm's deadlock surfaces as EIDLETIMEOUT. Different bug, same root cause area (rootless container networking). This fix should also let us drop the retry loop (since the underlying TCP issue is gone), but keeping it as belt-and-suspenders for now — it costs nothing on success and provides resilience against transient registry hiccups.
1 parent dfc0024 commit 598ce98

1 file changed

Lines changed: 10 additions & 1 deletion

File tree

scripts/deploy-tucker.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,17 @@ ssh "$REMOTE_HOST" "cd $REMOTE_REPO && git fetch origin main && git checkout mai
131131
ok "main checked out + workspace installed"
132132

133133
# --- Rebuild images ---
134+
# `--network=host`: bypass podman's rootless netavark stack during the
135+
# build. From the host, `curl https://registry.npmjs.org/...` returns
136+
# 200 in 0.37 s; from inside a default-network rootless container,
137+
# the same fetch idle-times-out at ~60 s under EIDLETIMEOUT (verified
138+
# 2026-04-27 after 8 consecutive deploy attempts hit the same
139+
# pattern). Netavark mangles the TCP keep-alives that npm's HTTP
140+
# agent relies on. Using host networking for the BUILD only —
141+
# runtime containers still use the pod's Network=host already.
134142
step "rebuilding pushflip-vite (~2-15 min depending on lockfile churn)"
135143
ssh "$REMOTE_HOST" "set -a; source $PROD_ENV; set +a; cd $REMOTE_REPO && podman build \
144+
--network=host \
136145
-t localhost/pushflip-vite:latest \
137146
--build-arg VITE_FAUCET_URL=/api/faucet \
138147
--build-arg VITE_NICKNAME_URL=/api/nickname \
@@ -143,7 +152,7 @@ ssh "$REMOTE_HOST" "set -a; source $PROD_ENV; set +a; cd $REMOTE_REPO && podman
143152
ok "pushflip-vite:latest built"
144153

145154
step "rebuilding pushflip-faucet (~1-3 min)"
146-
ssh "$REMOTE_HOST" "cd $REMOTE_REPO && podman build -t localhost/pushflip-faucet:latest -f faucet/Dockerfile ." \
155+
ssh "$REMOTE_HOST" "cd $REMOTE_REPO && podman build --network=host -t localhost/pushflip-faucet:latest -f faucet/Dockerfile ." \
147156
|| { fail "pushflip-faucet build failed"; print_rollback_cmd; exit 1; }
148157
ok "pushflip-faucet:latest built"
149158

0 commit comments

Comments
 (0)