Fix dependabot auto-merge (#151)

wallstop · web-flow · commit 7d43975df8aa · 2026-04-25T14:16:54.000-07:00
# Fix Dependabot auto-merge

## Summary

Fixes the Dependabot auto-merge workflow, which was previously
configured to make a single timed-out attempt rather than waiting for
required CI checks to pass.

## Changes

### Dependabot auto-merge overhaul
(`scripts/ci/enable-dependabot-automerge.sh`,
`.github/workflows/dependabot-auto-merge.yml`)

- **Removed one-shot mode.** The old workflow ran the script with a
45-second hard timeout (`DEPENDABOT_AUTOMERGE_ONE_SHOT=true`), causing
it to bail before CI checks could settle. The script now always polls
until checks complete.
- **Increased job timeout** from 1 minute to 40 minutes to give the
polling loop enough time to observe all required checks.
- **Added `SIGTERM` handler** so the job can emit a clear error message
if the runner kills it at the job timeout boundary.
- **Broadened failure detection** (allowlist instead of denylist):
checks and statuses are now considered failed if their conclusion/state
is anything other than `success`, `skipped`, `neutral`, or `pending` —
catching conclusions like `startup_failure` and `stale` that the old
explicit list missed.
- **Added fallback when all required checks belong to the current job**:
if the only required checks reported are the auto-merge job itself, the
script now falls back to gating on all non-self check-runs/statuses
instead of treating the gate as already satisfied.
- **Removed `jq` conditional guard**: `jq` is now always required (it
was already required in practice).

### New agent preflight script (`scripts/ci/agent-preflight.py`,
`scripts/tests/test_agent_preflight.py`)

Adds a changed-file-aware preflight script for agentic workflows. It
runs a targeted set of validations (version-sync checks, LLM skill
linting, etc.) based on which files are staged, so agents catch issues
before hitting pre-commit hooks. Ships with a full test suite (`248`
test cases).

### Dependency pin rollback (all CI workflows)

Rolls back `mozilla-actions/sccache-action` from `v0.0.10` to `v0.0.9`
across all workflows (`ci-rust.yml`, `ci-benchmarks.yml`,
`ci-network.yml`, `ci-security.yml`, `ci-verification.yml`,
`publish.yml`).
diff --git a/.github/workflows/dependabot-auto-merge.yml b/.github/workflows/dependabot-auto-merge.yml
@@ -8,11 +8,16 @@ permissions:
   contents: read
   pull-requests: read
 
+# Dependabot auto-merge policy is defined in .llm/context.md and locked in by
+# scripts/tests/test_enable_dependabot_automerge.py. Do NOT add a one-shot
+# bypass, do NOT wrap the script in a sub-job-level `timeout`, and do NOT
+# lower `timeout-minutes` below 32 — the polling loop needs ~32 minutes
+# to settle plus runner overhead.
 jobs:
   dependabot:
     name: Enable auto-merge for Dependabot PRs
     if: github.event.pull_request.user.login == 'dependabot[bot]' && !github.event.pull_request.draft && github.event.pull_request.head.repo.full_name == github.repository
-    timeout-minutes: 1
+    timeout-minutes: 40
     concurrency:
       group: dependabot-automerge-${{ github.event.pull_request.number }}
       cancel-in-progress: true
@@ -36,22 +41,8 @@ jobs:
         with:
           ref: ${{ github.event.pull_request.base.sha }}
       - name: Enable auto-merge
-        run: |
-          # Keep a small buffer between command timeout and job timeout for error handling/cleanup.
-          # Job runs on ubuntu-latest, where GNU timeout is available.
-          timeout --signal=TERM --kill-after=5s 45s bash ./scripts/ci/enable-dependabot-automerge.sh
-          rc=$?
-          if [ "$rc" -ne 0 ]; then
-            # GNU timeout exits with 124 when the command exceeded the limit.
-            if [ "$rc" -eq 124 ]; then
-              echo "::error::Automerge one-shot timed out after 45s; refusing long wait."
-            else
-              echo "::error::Automerge script failed with exit code ${rc}."
-            fi
-            exit "$rc"
-          fi
+        run: bash ./scripts/ci/enable-dependabot-automerge.sh
         env:
           PR_URL: ${{ github.event.pull_request.html_url }}
           PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
-          DEPENDABOT_AUTOMERGE_ONE_SHOT: "true"
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.llm/context.md b/.llm/context.md
@@ -159,7 +159,7 @@ Pre-commit validates registration only, NOT that proofs pass. Run affected proof
 
 Also: `ci-rust.yml` (Miri), `ci-security.yml` (cargo-geiger, cargo-deny).
 
-Dependabot auto-merge policy: this repository is squash-only. Use `scripts/ci/enable-dependabot-automerge.sh` (which enforces `--squash`, supports one-shot enable mode for fast auto-merge setup, defaults to waiting for checks with fallback when required-check metadata is unavailable, and checks policy drift) instead of inline merge commands in workflows.
+Dependabot auto-merge policy: this repository is squash-only and the auto-merge job MUST wait for every non-self CI gate on the PR head SHA to reach an explicitly accepted state (`success`, `skipped`, `neutral`) before enabling merge. Anything else -- including `failure`, `timed_out`, `cancelled`, `action_required`, `startup_failure`, `stale`, missing/`null` conclusions, or future GitHub-added states -- refuses the merge (allow-list semantics). Use `scripts/ci/enable-dependabot-automerge.sh` -- never inline `gh pr merge` in workflows, never re-introduce a "one-shot" / bypass path, never wrap the script in a sub-job-level `timeout`. Regression-tested in `scripts/tests/test_enable_dependabot_automerge.py`.
 
 **CI fails on:** unformatted code, clippy warnings, broken doc links, markdown lint errors, workflow syntax errors, unregistered Kani proofs.
 
@@ -220,12 +220,15 @@ For protocol tests that poll in loops (`poll_remote_clients()` / protocol `poll(
 
 **Unreleased code rule:** Never add separate "Fixed" or "Changed" entries for code that has not yet been released. Fixes to unreleased features should be folded into the existing "Added" entry describing that feature. The changelog should describe the final shipped state, not intermediate development history.
 
+**Version sync rule:** If `Cargo.toml` is `X.Y.Z`, the matching changelog header must be `## [X.Y.Z] - YYYY-MM-DD` (ISO date required). Keep `## [Unreleased]` undated. Validate with `bash scripts/sync-version.sh --check`; auto-fix with `bash scripts/sync-version.sh --changelog-only`.
+
 ## Mandatory Linting
 
 - **After Rust changes:** `cargo fmt && cargo clippy --all-targets --features tokio,json` (or `cargo c`)
 - **After workflow changes:** `actionlint` (no exceptions)
 - **After doc changes:** `cargo doc --no-deps`
 - **After markdown changes:** `npx markdownlint 'file.md' --config .markdownlint.json --fix`
+- **Before finalizing agent work:** `python3 scripts/ci/agent-preflight.py --auto-fix` (changed-file-aware early checks; if output includes `Falling back to --all checks.`, resolve git-state issues and rerun)
 - **After shell-script changes:** `bash scripts/ci/check-shell-portability.sh`
 - **After `.llm/` changes:** All `.md` files under `.llm/` must be **300 lines or fewer** (enforced by pre-commit hook `llm-line-limit`)
 - **Link validation:** `./scripts/docs/check-links.sh`
diff --git a/.llm/skills/ci-cd-tooling/github-actions.md b/.llm/skills/ci-cd-tooling/github-actions.md
@@ -230,3 +230,17 @@ env:
 - [ ] Verify `permissions:` block is minimal
 - [ ] Verify `runs-on` uses valid runner labels
 - [ ] Validate matrix combinations
+
+## Dependabot auto-merge gating
+
+Auto-merge polls `gh pr checks --required` for the PR head SHA (the primary path). When required-check metadata is unavailable -- no required checks are configured for the branch, or the only required check is the auto-merge job itself -- it falls back to `repos/{owner}/{repo}/commits/{sha}/check-runs` and `…/status`. Both paths exclude the auto-merge job's own entries by filtering `link`/`details_url`/`target_url` against `GITHUB_RUN_ID`.
+
+Failure classification uses an **allow-list**: only `success`, `skipped`, and `neutral` proceed; everything else (including `failure`, `timed_out`, `cancelled`, `action_required`, `startup_failure`, `stale`, missing/`null`, or future GitHub-added states) blocks the merge. `skipped` is allowed because matrix builds and conditional jobs legitimately produce it; `neutral` is allowed because GitHub-native advisory checks (e.g. `dependency-review-action`) emit it for non-failure findings.
+
+Three regression guardrails in `scripts/tests/test_enable_dependabot_automerge.py` lock in this policy:
+
+1. `test_workflow_does_not_set_one_shot_env` -- fails CI if the bypass env var is reintroduced.
+2. `test_workflow_run_command_is_pure_script_invocation` -- fails CI if the script is wrapped in `timeout`, `xargs`, or any prefix command.
+3. `test_workflow_timeout_is_sufficient_for_polling` -- fails CI if the workflow `timeout-minutes` falls below 32 minutes (the polling settle ceiling + a 2-minute buffer).
+
+Branch protection is a defense layer, not a substitute. Configure `main` to require the relevant CI checks so GitHub's native auto-merge respects them too -- but the script's own gating remains the source of truth.
diff --git a/.llm/skills/publishing-organization/changelog.md b/.llm/skills/publishing-organization/changelog.md
@@ -50,6 +50,14 @@
 
 Sections: Added, Changed, Deprecated, Removed, Fixed, Security.
 
+## Release Header and Version Sync
+
+- Keep `## [Unreleased]` undated.
+- Use ISO dates on release headers: `## [X.Y.Z] - YYYY-MM-DD`.
+- If `Cargo.toml` is `X.Y.Z`, the matching changelog header must be dated.
+- Validate: `bash scripts/sync-version.sh --check`
+- Auto-fix link/date metadata: `bash scripts/sync-version.sh --changelog-only`
+
 ## Writing Guidelines
 
 ### Be User-Focused
@@ -122,6 +130,9 @@ Changing `Display`/`Debug` output is **Breaking** if users might depend on forma
 ## Verification Before Committing
 
 ```bash
+# Validate changelog/version synchronization
+bash scripts/sync-version.sh --check
+
 # Verify derives exist before claiming them
 rg '#\[derive.*Hash' src/lib.rs
 
diff --git a/.llm/skills/workflows/dev-pipeline.md b/.llm/skills/workflows/dev-pipeline.md
@@ -44,15 +44,15 @@ Structured workflow from planning through shipping for fortress-rollback. Each p
 
 ### Scope Decision Rules
 
-| Change Type | Needs Scope Doc? | Needs CHANGELOG? |
-|-------------|-----------------|-----------------|
-| Bug fix (pub-visible) | Yes | Yes |
-| Bug fix (internal) | Yes | No |
-| New public API | Yes | Yes |
-| Refactoring | Brief | No |
-| Dependency update | No | If user-visible |
-| CI/tooling | No | No |
-| Kani proof | Brief | No |
+| Change Type           | Needs Scope Doc? | Needs CHANGELOG? |
+| --------------------- | ---------------- | ---------------- |
+| Bug fix (pub-visible) | Yes              | Yes              |
+| Bug fix (internal)    | Yes              | No               |
+| New public API        | Yes              | Yes              |
+| Refactoring           | Brief            | No               |
+| Dependency update     | No               | If user-visible  |
+| CI/tooling            | No               | No               |
+| Kani proof            | Brief            | No               |
 
 ---
 
@@ -85,14 +85,14 @@ Before running verification/debugging commands in this workflow, confirm:
 
 ### Design Patterns to Follow
 
-| When You Need | Use This Pattern | Example in Codebase |
-|---------------|-----------------|-------------------|
-| Configurable construction | Builder | `SessionBuilder` |
-| Protocol state machine | Type-state or enum | `SessionState`, protocol states |
-| Input prediction | Strategy | `PredictionStrategy` trait |
-| Bounded collections | Circular buffer | `SavedStates` |
-| Request/response | Request enum | `FortressRequest` |
-| Error context | Structured enum | `InternalErrorKind`, `InvalidRequestKind` |
+| When You Need             | Use This Pattern   | Example in Codebase                       |
+| ------------------------- | ------------------ | ----------------------------------------- |
+| Configurable construction | Builder            | `SessionBuilder`                          |
+| Protocol state machine    | Type-state or enum | `SessionState`, protocol states           |
+| Input prediction          | Strategy           | `PredictionStrategy` trait                |
+| Bounded collections       | Circular buffer    | `SavedStates`                             |
+| Request/response          | Request enum       | `FortressRequest`                         |
+| Error context             | Structured enum    | `InternalErrorKind`, `InvalidRequestKind` |
 
 ---
 
@@ -143,11 +143,11 @@ cargo nextest run module_name --no-capture
 
 ### Commit Granularity
 
-| Good Commit | Bad Commit |
-|-------------|-----------|
-| "Add `InputDelayTooLarge` variant to `InvalidRequestKind`" | "Various changes" |
-| "Validate input delay in `SessionBuilder::with_input_delay`" | "Fix stuff" |
-| "Add regression test for issue #42" | "WIP" |
+| Good Commit                                                  | Bad Commit        |
+| ------------------------------------------------------------ | ----------------- |
+| "Add `InputDelayTooLarge` variant to `InvalidRequestKind`"   | "Various changes" |
+| "Validate input delay in `SessionBuilder::with_input_delay`" | "Fix stuff"       |
+| "Add regression test for issue #42"                          | "WIP"             |
 
 Each commit should pass `cargo c && cargo t` independently.
 
@@ -168,6 +168,9 @@ rg '\.unwrap\(\)|\.expect\(|panic!\(|todo!\(' --type rust src/
 # Determinism scan
 rg 'HashMap|HashSet|Instant::now|thread_rng' --type rust src/
 
+# Agent preflight (catches version sync/.llm/workflow issues early)
+python3 scripts/ci/agent-preflight.py --auto-fix
+
 # Full quality gate (see context.md "Mandatory Linting" for details)
 cargo c && cargo t
 cargo doc --no-deps
@@ -260,10 +263,10 @@ For production-blocking bugs, compress the pipeline:
 
 ## Anti-Patterns
 
-| Anti-Pattern | Why It Hurts | Do Instead |
-|-------------|-------------|------------|
-| Code first, think later | Rework, wrong abstraction | Scope and design first |
-| Skip tests | Bugs ship, regression debt | Write tests before code |
-| Mega-PR (>500 lines) | Hard to review, risky merge | Split into stacked PRs |
-| Fix + refactor in one PR | Hard to review, bisect-breaking | Separate commits/PRs |
-| Skip self-review | Obvious issues waste reviewer time | Review your own diff first |
+| Anti-Pattern             | Why It Hurts                       | Do Instead                 |
+| ------------------------ | ---------------------------------- | -------------------------- |
+| Code first, think later  | Rework, wrong abstraction          | Scope and design first     |
+| Skip tests               | Bugs ship, regression debt         | Write tests before code    |
+| Mega-PR (>500 lines)     | Hard to review, risky merge        | Split into stacked PRs     |
+| Fix + refactor in one PR | Hard to review, bisect-breaking    | Separate commits/PRs       |
+| Skip self-review         | Obvious issues waste reviewer time | Review your own diff first |
diff --git a/.llm/skills/workflows/review-readiness.md b/.llm/skills/workflows/review-readiness.md
@@ -16,6 +16,7 @@ Concrete gate between implementation and external review. Use this after `dev-pi
 - [ ] Tests cover happy + error paths for changed behavior
 - [ ] Design decision log reviewed for major architecture choices
 - [ ] CHANGELOG decision applied for user-observable/public changes
+- [ ] Agent preflight passes (`python3 scripts/ci/agent-preflight.py --auto-fix`)
 
 If two or more checks fail, return to design and reduce scope before requesting review.
 
@@ -32,6 +33,9 @@ rg '\.unwrap\(\)|\.expect\(|panic!\(|todo!\(|unimplemented!\(' --type rust src/
 
 # Determinism scan
 rg 'HashMap|HashSet|Instant::now|SystemTime|thread_rng|random\(\)' --type rust src/
+
+# Changed-file-aware preflight checks (version sync, .llm quality, workflows)
+python3 scripts/ci/agent-preflight.py --auto-fix
 ```
 
 ---
@@ -45,6 +49,7 @@ Review Readiness
 - Build/tests: PASS|FAIL
 - Zero-panic: PASS|FAIL
 - Determinism: PASS|FAIL
+- Agent preflight: PASS|FAIL
 - Error handling: PASS|FAIL
 - Tests breadth: PASS|FAIL
 - Design log reviewed: YES|NO|N/A
diff --git a/AGENTS.md b/AGENTS.md
@@ -3,7 +3,3 @@
 **Read and follow [`.llm/context.md`](.llm/context.md)** — the canonical source of truth for all project context, development policies, testing guidelines, and coding standards. You must read it before making any changes.
 
 When clarifying questions are needed, follow [`.llm/templates/ask-user-question.md`](.llm/templates/ask-user-question.md) to keep questions concise and actionable.
-
-## Critical Rules
-
-- **Test output:** NEVER pipe test output through `tail`/`head` (e.g., `cargo nextest run 2>&1 | tail -40`). Instead, redirect to a temp file and read it: `cargo nextest run --no-capture > /tmp/test-results.txt 2>&1`. For repeated runs, use a for loop.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -14,7 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
-## [0.8.0]
+## [0.8.0] - 2026-04-25
 
 ### Changed
 
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,6 +8,7 @@ When clarifying questions are needed, follow [`.llm/templates/ask-user-question.
 
 - **Zero-panic:** No `unwrap()`, `expect()`, `panic!()`, `todo!()` in production code
 - **Pre-commit:** `cargo fmt && cargo clippy --all-targets --features tokio,json && cargo nextest run --no-capture` (or `cargo c && cargo t`)
+- **Agent preflight:** Before finalizing changes, run `python3 scripts/ci/agent-preflight.py --auto-fix`. If output includes `Falling back to --all checks.`, resolve the git-state issue and rerun preflight.
 - **Test output:** NEVER pipe test output through `tail`/`head` (e.g., `cargo nextest run 2>&1 | tail -40`). Instead, redirect to a temp file and read it: `cargo nextest run --no-capture > /tmp/test-results.txt 2>&1`. For repeated runs, use a for loop.
 - **Kani:** Always add `#[kani::unwind(N)]` to proofs; CI uses `--default-unwind 8` via `--quick` mode
 - **Changelog:** Ask "Does this affect `pub` items or user-observable behavior?" — if yes, update `CHANGELOG.md`
diff --git a/scripts/ci/agent-preflight.py b/scripts/ci/agent-preflight.py
diff --git a/scripts/ci/enable-dependabot-automerge.sh b/scripts/ci/enable-dependabot-automerge.sh
diff --git a/scripts/tests/test_agent_preflight.py b/scripts/tests/test_agent_preflight.py
diff --git a/scripts/tests/test_enable_dependabot_automerge.py b/scripts/tests/test_enable_dependabot_automerge.py