Skip to content

chore: harden codebase for AI coding agents (Tier 1 sensors)#123

Open
harrymunro wants to merge 8 commits into
mainfrom
chore/harden-for-agents
Open

chore: harden codebase for AI coding agents (Tier 1 sensors)#123
harrymunro wants to merge 8 commits into
mainfrom
chore/harden-for-agents

Conversation

@harrymunro
Copy link
Copy Markdown
Collaborator

@harrymunro harrymunro commented May 28, 2026

Summary

  • Install Ruff with AI-targeted thresholds in a new pyproject.toml so every lint message becomes self-correction feedback for an agent.
  • Add .pre-commit-config.yaml (Gitleaks + Ruff + standard hygiene) and a new CI job (python-lint + secret-scan) so the same checks fire locally, on commit, and in CI.
  • Document the sensors in CLAUDE.md and AGENTS.md — including the suppress-with-reason convention and the brownfield refactor backlog — so future agent invocations know to run them and how to interpret violations.

Based on the sensors-as-prompts and suppressions-as-review-anchors ideas from Birgitta Böckeler, Maintainability sensors for coding agents (martinfowler.com), via the harden-for-agents skill.

Note on scope

This PR also lands the merge of PR #120 (nelson_data_patterns.py learned-standing-orders pipeline, ~2,400 LOC). That feature is not part of the Tier 1 sensors work — it merged into this branch via 9113af9 before the hardening commit and rides along to main. The Tier 1 sensors changes themselves are confined to pyproject.toml, .pre-commit-config.yaml, .github/workflows/ci.yml, CLAUDE.md, AGENTS.md, docs/project_structure.md, plus the auto-fixes and reasoned noqas the new ruff config required.

Adversarial-review remediation (commits 361a506..82ebd74)

A five-agent adversarial review surfaced three CRITICAL issues plus a long tail of HIGH/MEDIUM follow-ups. Applied in five CI-safe commits:

  1. fix: restore LICENSE and README ownership — Revert LICENSE copyright to "2025 Harry Munro" (matches main); fix the stale aspegio/nelson marketplace path in README quick-start to harrymunro/nelson.
  2. chore: tighten lint config and triage surfaced violations — Drop the bogus PLR0913 global ignore (the previous "covered by PLR0917" comment was wrong — PLR0917 is preview-only). The rule now fires; test/conftest helpers covered via per-file ignore, four production sites get reasoned per-function noqas pointing at the nelson-e6j refactor backlog. Also: anchor test glob to **/test_*.py, drop S108 from test ignore (replaced with one per-line noqa for a single placeholder), tighten the S603 rationale, delete the decorative [tool.mypy] block (roll-out tracked under nelson-8q8), and delete the test_regex*.py scratch files at the repo root (nelson-6bw).
  3. chore: suppression hygiene + safer subprocess test pattern — Add a reason to the bare # type: ignore on hooks/nelson_hooks.py:653 (matches the documented shape on line 625). Refactor the _write_json crash/cleanup tests in test_nelson_data_memory.py to pass paths via env= injection rather than interpolating fixture values into a python -c string — the S603 suppression rationale becomes minimal and accurate.
  4. ci: SHA-pin actions and pre-commit, add parity job, parallel tests — SHA-pin every third-party uses: (tag preserved as trailing comment); pre-commit autoupdate --freeze for the three hook repos; new pre-commit CI job using pre-commit/action@<sha> to close local↔CI parity for trailing-whitespace, end-of-file-fixer, check-{yaml,toml,json}, check-added-large-files, check-merge-conflict, ruff, and gitleaks; pin GITLEAKS_VERSION: 8.30.0 in the secret-scan job to match the pre-commit engine; cache: pip + cache-dependency-path; concurrency.cancel-in-progress scoped to PRs only; test job becomes a matrix of [skills/nelson/scripts, hooks, scripts] with fail-fast: false.
  5. docs: soften brownfield wording, trim AGENTS.md, document sensors — Rewrite the Brownfield complexity backlog paragraph in CLAUDE.md so it no longer creates a perverse incentive to avoid flagged functions; collapse the duplicated Maintainability sensors section in AGENTS.md to a single pointer at CLAUDE.md; add pyproject.toml, .pre-commit-config.yaml, and .github/workflows/ci.yml to docs/project_structure.md; add a Local development section to CONTRIBUTING.md.

What's in the lint config

Setting Value Why
max-args 5 Agents over-stuff signatures; dataclasses/typed dicts are usually the answer
max-branches 10 Caps inflation of cmd_* / main() handlers
max-statements 50 Same
max-complexity 10 Mccabe gate
max-returns 6
line-length 120 Brownfield baseline; codebase comfortable at 110-120, p75 confirms
Per-dir ignores scripts/, hooks/, **/test_*.py, conftest.py Each layer has legitimate uses of subprocess, asserts, late imports, print()

Triage outcome (brownfield)

Started at 183 violations after enabling the rules; ended at 0:

  • Auto-fixed (mechanical): import order, datetime UTC, unused imports.
  • Real bugs fixed: missing check= and strict= kwargs, unused variables, redefined loop variables, three lazy import json calls moved to module top.
  • Suppressed with reason on the largest functions (# noqa: C901, PLR0912, PLR0915 -- ... refactor tracked in nelson-e6j). Brownfield-safe — the rules still fire if anything gets worse, and the suppressions point at a refactor backlog rather than disappearing into noise.
  • Disabled globally with rationale: SIM108 (if/else→ternary often degrades readability), RUF002 (Unicode in docstrings is intentional prose).
  • Format: one-time ruff format baseline applied across 22 files.

What's deferred (filed as follow-up issues)

  • mypy strict mode (nelson-8q8) — ≈82% of non-test functions annotated (178/217 across 12 files; scripts/count-tokens.py has 0/6), but strict mode would drown out the linter signal. Roll out incrementally.
  • Tier 2 (nelson-7rm) — coverage gate, import-linter contracts, Semgrep SAST.
  • Tier 3 — mutation testing, scheduled inferential AI reviews.
  • Unify pytest config (nelson-y2l) — each test dir has its own conftest.py with overlapping fixture names; CI runs them as a matrix for now.
  • Narrow PLR0915/PLR0912 blanket ignore in test files — surface and triage ~5-10 large-test refactor candidates (new follow-up to file post-merge).
  • Replace gitleaks-action@v2 with the direct gitleaks binary — closes the fork-PR fail-open scenario where GITLEAKS_LICENSE is missing (new follow-up to file post-merge).

Test plan

  • ruff check — All checks passed
  • ruff format --check — 25 files already formatted
  • pytest skills/nelson/scripts/ — 353 passed
  • pytest hooks/ — 64 passed
  • pytest scripts/ — 21 passed
  • pre-commit run --all-files — all hooks pass
  • ruff check --select PLR0913 . — exits 0 (sensor actually fires now)
  • CI green on push (new python-lint, pre-commit, and matrix test jobs alongside existing secret-scan, markdown, yaml, links, spelling, references)
  • After merge: pre-commit install once locally so hooks fire on every commit

Install AI-targeted sensors so agent edits stay maintainable as the
codebase evolves. Each violation message becomes feedback the agent
reads and self-corrects against; suppress-with-reason and
threshold-bump become reviewable anchors.

Tier 1 sensors installed:
- pyproject.toml: Ruff with AI-targeted thresholds (max-args=5,
  max-branches=10, max-statements=50, max-complexity=10,
  line-length=120 to match the codebase's actual long-tail) and
  rulesets E F W I B C90 PL RUF S SIM T20 UP. Per-directory ignores
  for scripts/, hooks/, tests/, and conftest.py reflecting how each
  layer legitimately uses subprocess, asserts, late imports, etc.
- .pre-commit-config.yaml: Gitleaks (secret scanning), Ruff
  lint+format, plus standard hygiene (trailing whitespace, EOF,
  merge conflict, large file). Install with: pre-commit install.
- .github/workflows/ci.yml: new python-lint and secret-scan jobs
  alongside the existing markdown/yaml/typos/tests/references checks.
- CLAUDE.md / AGENTS.md: maintainability-sensors section documenting
  the sensors, the suppress-with-reason convention, and the
  brownfield complexity backlog.

Brownfield triage outcome (started at 183 violations, ended at 0):
- Auto-fixed imports, datetime UTC, unused imports.
- Manually fixed real bugs: unused vars, missing check= and strict=
  kwargs, redefined loop names, lazy json imports moved to module top.
- Suppressed complexity violations on the largest functions with
  inline noqa+reason pointing to a follow-up refactor backlog
  (rather than bulk-refactoring, which the source article warns
  produces over-engineering).
- Disabled SIM108 (if/else->ternary) and RUF002 (Unicode in
  docstrings) globally with comments explaining why; they generate
  noise more often than signal in this codebase.
- Applied ruff format across the repo as a one-time baseline.

All 438 tests still pass; ruff lint + format clean.

mypy strict mode and Tier 2/3 sensors (coverage gate, import-linter,
SAST, mutation testing, periodic AI reviews) are deferred to
follow-up tasks so each tier can stabilise before the next is added.
- Autoupdate to pre-commit-hooks v6.0.0, ruff-pre-commit v0.15.14,
  gitleaks v8.30.0 (caught on first `pre-commit run --all-files`).
- end-of-file-fixer added missing trailing newlines to
  `.slim/cartography.json` and `demos/battleships/styles/main.css`.
Revert LICENSE copyright to "2025 Harry Munro" (matches main) and fix
the stale "aspegio/nelson" marketplace path in README quick-start to
"harrymunro/nelson" so every URL/badge in the doc agrees.
Drop PLR0913 from the global ignore (the previous "covered by PLR0917"
note was wrong — PLR0917 is preview-only and never active). PLR0913 now
fires, with two paths:

- Test/conftest fixture helpers — per-file ignore (legitimate fixture
  surface; not a code-quality signal).
- Production sites (4 functions in nelson_data_lifecycle.py and
  nelson_data_patterns.py) — per-function `# noqa: PLR0913 -- …`
  pointing at the relevant schema/CLI shape and backlogging the
  dataclass refactor under nelson-e6j.

Other lint config corrections:
- Anchor test glob to `**/test_*.py` (worked by accident before).
- Drop S108 from test ignore — only one real placeholder string
  remains, and it's documented with a per-line noqa.
- Add S603 callout that the suppression does NOT license `python -c`
  f-string interpolation of fixture-derived values (the C3 refactor).
- Delete the `test_regex*.py` per-file ignore and the two scratch
  files at the repo root (covered nelson-6bw cleanup).
- Drop the decorative `[tool.mypy]` block; the strict-mode roll-out
  lives under nelson-8q8.
Two small hardening fixes:

1. hooks/nelson_hooks.py:653 — the bare `# type: ignore[import-not-found]`
   inside the parens of the second `from nelson_circuit_breakers import …`
   block had no `--` reason. Move it onto the `from` line and merge with
   the existing PLC0415 noqa so it follows the same documented shape as
   the earlier import on line 625.

2. skills/nelson/scripts/test_nelson_data_memory.py — the
   `_write_json` crash/cleanup tests were building `python -c` bodies via
   f-string interpolation of `tmp_path`-derived values. That's exactly
   the pattern the S603 per-file-ignore in pyproject.toml now explicitly
   warns against. Switch to a single literal probe body that reads
   `SCRIPT_DIR` and `TARGET` from `os.environ`. The subprocess argv
   becomes fully literal — the S603 suppression rationale becomes
   minimal and accurate.
Supply-chain hardening + local↔CI parity for the CI workflow:

- SHA-pin every third-party action (`actions/checkout`, `setup-python`,
  `gitleaks-action`, `markdownlint-cli2-action`, `action-yamllint`,
  `lychee-action`, `typos`) with the original tag preserved as a
  trailing comment for human readability and bot upgrade pathways.
- `pre-commit autoupdate --freeze` does the equivalent for the three
  repos in `.pre-commit-config.yaml`; ruff happens to bump v0.15.14
  -> v0.15.15 as a side effect (clean).
- New `pre-commit` job runs `pre-commit/action@<sha>` so every hook
  used locally also runs in CI — closes drift for trailing-whitespace,
  end-of-file-fixer, check-{yaml,toml,json}, check-added-large-files,
  check-merge-conflict, ruff, and gitleaks.
- `secret-scan` job now pins `GITLEAKS_VERSION: 8.30.0` to match the
  pre-commit engine.
- `concurrency.cancel-in-progress` is now scoped to PRs only — pushes
  to main always run to completion.
- `cache: pip` + `cache-dependency-path` on python jobs.
- `test` job becomes a matrix of `[skills/nelson/scripts, hooks,
  scripts]` with `fail-fast: false` so a failure in one suite doesn't
  hide failures in the others.
- CLAUDE.md → *Brownfield complexity backlog*: rewrite the touch-the-
  function-and-refactor-or-keep-the-noqa wording so it removes the
  perverse incentive to avoid flagged functions. The refactor stays
  opportunistic; the noqa stays in place if the edit doesn't reduce
  complexity.
- AGENTS.md → *Maintainability sensors*: collapse the duplicated
  bullet list to a single pointer at CLAUDE.md so the two docs can't
  drift apart again.
- docs/project_structure.md: add `pyproject.toml`,
  `.pre-commit-config.yaml`, and `.github/workflows/ci.yml` to the
  repo-root entries — these are the files CLAUDE.md tells the agent
  to read.
- CONTRIBUTING.md: add a *Local development* section with the
  pre-commit install command, the ruff invocations, and the three
  pytest commands, plus a link back to the sensor reference.
Pre-existing markdownlint MD031 violation on the fenced code block
inside the *Tests* bullet of the Maintainability sensors section.
Failed on every CI run since the Tier 1 sensors landed (27460f9) but
went unnoticed because the markdown job was already red on other
issues for unrelated branches.

Adding a blank line above and below the fence brings markdownlint to
zero errors across all 61 tracked .md files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant