chore: harden codebase for AI coding agents (Tier 1 sensors)#123
Open
harrymunro wants to merge 8 commits into
Open
chore: harden codebase for AI coding agents (Tier 1 sensors)#123harrymunro wants to merge 8 commits into
harrymunro wants to merge 8 commits into
Conversation
Install AI-targeted sensors so agent edits stay maintainable as the codebase evolves. Each violation message becomes feedback the agent reads and self-corrects against; suppress-with-reason and threshold-bump become reviewable anchors. Tier 1 sensors installed: - pyproject.toml: Ruff with AI-targeted thresholds (max-args=5, max-branches=10, max-statements=50, max-complexity=10, line-length=120 to match the codebase's actual long-tail) and rulesets E F W I B C90 PL RUF S SIM T20 UP. Per-directory ignores for scripts/, hooks/, tests/, and conftest.py reflecting how each layer legitimately uses subprocess, asserts, late imports, etc. - .pre-commit-config.yaml: Gitleaks (secret scanning), Ruff lint+format, plus standard hygiene (trailing whitespace, EOF, merge conflict, large file). Install with: pre-commit install. - .github/workflows/ci.yml: new python-lint and secret-scan jobs alongside the existing markdown/yaml/typos/tests/references checks. - CLAUDE.md / AGENTS.md: maintainability-sensors section documenting the sensors, the suppress-with-reason convention, and the brownfield complexity backlog. Brownfield triage outcome (started at 183 violations, ended at 0): - Auto-fixed imports, datetime UTC, unused imports. - Manually fixed real bugs: unused vars, missing check= and strict= kwargs, redefined loop names, lazy json imports moved to module top. - Suppressed complexity violations on the largest functions with inline noqa+reason pointing to a follow-up refactor backlog (rather than bulk-refactoring, which the source article warns produces over-engineering). - Disabled SIM108 (if/else->ternary) and RUF002 (Unicode in docstrings) globally with comments explaining why; they generate noise more often than signal in this codebase. - Applied ruff format across the repo as a one-time baseline. All 438 tests still pass; ruff lint + format clean. mypy strict mode and Tier 2/3 sensors (coverage gate, import-linter, SAST, mutation testing, periodic AI reviews) are deferred to follow-up tasks so each tier can stabilise before the next is added.
- Autoupdate to pre-commit-hooks v6.0.0, ruff-pre-commit v0.15.14, gitleaks v8.30.0 (caught on first `pre-commit run --all-files`). - end-of-file-fixer added missing trailing newlines to `.slim/cartography.json` and `demos/battleships/styles/main.css`.
Revert LICENSE copyright to "2025 Harry Munro" (matches main) and fix the stale "aspegio/nelson" marketplace path in README quick-start to "harrymunro/nelson" so every URL/badge in the doc agrees.
Drop PLR0913 from the global ignore (the previous "covered by PLR0917" note was wrong — PLR0917 is preview-only and never active). PLR0913 now fires, with two paths: - Test/conftest fixture helpers — per-file ignore (legitimate fixture surface; not a code-quality signal). - Production sites (4 functions in nelson_data_lifecycle.py and nelson_data_patterns.py) — per-function `# noqa: PLR0913 -- …` pointing at the relevant schema/CLI shape and backlogging the dataclass refactor under nelson-e6j. Other lint config corrections: - Anchor test glob to `**/test_*.py` (worked by accident before). - Drop S108 from test ignore — only one real placeholder string remains, and it's documented with a per-line noqa. - Add S603 callout that the suppression does NOT license `python -c` f-string interpolation of fixture-derived values (the C3 refactor). - Delete the `test_regex*.py` per-file ignore and the two scratch files at the repo root (covered nelson-6bw cleanup). - Drop the decorative `[tool.mypy]` block; the strict-mode roll-out lives under nelson-8q8.
Two small hardening fixes: 1. hooks/nelson_hooks.py:653 — the bare `# type: ignore[import-not-found]` inside the parens of the second `from nelson_circuit_breakers import …` block had no `--` reason. Move it onto the `from` line and merge with the existing PLC0415 noqa so it follows the same documented shape as the earlier import on line 625. 2. skills/nelson/scripts/test_nelson_data_memory.py — the `_write_json` crash/cleanup tests were building `python -c` bodies via f-string interpolation of `tmp_path`-derived values. That's exactly the pattern the S603 per-file-ignore in pyproject.toml now explicitly warns against. Switch to a single literal probe body that reads `SCRIPT_DIR` and `TARGET` from `os.environ`. The subprocess argv becomes fully literal — the S603 suppression rationale becomes minimal and accurate.
Supply-chain hardening + local↔CI parity for the CI workflow:
- SHA-pin every third-party action (`actions/checkout`, `setup-python`,
`gitleaks-action`, `markdownlint-cli2-action`, `action-yamllint`,
`lychee-action`, `typos`) with the original tag preserved as a
trailing comment for human readability and bot upgrade pathways.
- `pre-commit autoupdate --freeze` does the equivalent for the three
repos in `.pre-commit-config.yaml`; ruff happens to bump v0.15.14
-> v0.15.15 as a side effect (clean).
- New `pre-commit` job runs `pre-commit/action@<sha>` so every hook
used locally also runs in CI — closes drift for trailing-whitespace,
end-of-file-fixer, check-{yaml,toml,json}, check-added-large-files,
check-merge-conflict, ruff, and gitleaks.
- `secret-scan` job now pins `GITLEAKS_VERSION: 8.30.0` to match the
pre-commit engine.
- `concurrency.cancel-in-progress` is now scoped to PRs only — pushes
to main always run to completion.
- `cache: pip` + `cache-dependency-path` on python jobs.
- `test` job becomes a matrix of `[skills/nelson/scripts, hooks,
scripts]` with `fail-fast: false` so a failure in one suite doesn't
hide failures in the others.
- CLAUDE.md → *Brownfield complexity backlog*: rewrite the touch-the- function-and-refactor-or-keep-the-noqa wording so it removes the perverse incentive to avoid flagged functions. The refactor stays opportunistic; the noqa stays in place if the edit doesn't reduce complexity. - AGENTS.md → *Maintainability sensors*: collapse the duplicated bullet list to a single pointer at CLAUDE.md so the two docs can't drift apart again. - docs/project_structure.md: add `pyproject.toml`, `.pre-commit-config.yaml`, and `.github/workflows/ci.yml` to the repo-root entries — these are the files CLAUDE.md tells the agent to read. - CONTRIBUTING.md: add a *Local development* section with the pre-commit install command, the ruff invocations, and the three pytest commands, plus a link back to the sensor reference.
Pre-existing markdownlint MD031 violation on the fenced code block inside the *Tests* bullet of the Maintainability sensors section. Failed on every CI run since the Tier 1 sensors landed (27460f9) but went unnoticed because the markdown job was already red on other issues for unrelated branches. Adding a blank line above and below the fence brings markdownlint to zero errors across all 61 tracked .md files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pyproject.tomlso every lint message becomes self-correction feedback for an agent..pre-commit-config.yaml(Gitleaks + Ruff + standard hygiene) and a new CI job (python-lint+secret-scan) so the same checks fire locally, on commit, and in CI.CLAUDE.mdandAGENTS.md— including the suppress-with-reason convention and the brownfield refactor backlog — so future agent invocations know to run them and how to interpret violations.Based on the sensors-as-prompts and suppressions-as-review-anchors ideas from Birgitta Böckeler, Maintainability sensors for coding agents (martinfowler.com), via the
harden-for-agentsskill.Note on scope
This PR also lands the merge of PR #120 (
nelson_data_patterns.pylearned-standing-orders pipeline, ~2,400 LOC). That feature is not part of the Tier 1 sensors work — it merged into this branch via9113af9before the hardening commit and rides along tomain. The Tier 1 sensors changes themselves are confined topyproject.toml,.pre-commit-config.yaml,.github/workflows/ci.yml,CLAUDE.md,AGENTS.md,docs/project_structure.md, plus the auto-fixes and reasoned noqas the new ruff config required.Adversarial-review remediation (commits 361a506..82ebd74)
A five-agent adversarial review surfaced three CRITICAL issues plus a long tail of HIGH/MEDIUM follow-ups. Applied in five CI-safe commits:
fix: restore LICENSE and README ownership— Revert LICENSE copyright to "2025 Harry Munro" (matchesmain); fix the staleaspegio/nelsonmarketplace path in README quick-start toharrymunro/nelson.chore: tighten lint config and triage surfaced violations— Drop the bogus PLR0913 global ignore (the previous "covered by PLR0917" comment was wrong — PLR0917 is preview-only). The rule now fires; test/conftest helpers covered via per-file ignore, four production sites get reasoned per-function noqas pointing at thenelson-e6jrefactor backlog. Also: anchor test glob to**/test_*.py, drop S108 from test ignore (replaced with one per-line noqa for a single placeholder), tighten the S603 rationale, delete the decorative[tool.mypy]block (roll-out tracked undernelson-8q8), and delete thetest_regex*.pyscratch files at the repo root (nelson-6bw).chore: suppression hygiene + safer subprocess test pattern— Add a reason to the bare# type: ignoreonhooks/nelson_hooks.py:653(matches the documented shape on line 625). Refactor the_write_jsoncrash/cleanup tests intest_nelson_data_memory.pyto pass paths viaenv=injection rather than interpolating fixture values into apython -cstring — the S603 suppression rationale becomes minimal and accurate.ci: SHA-pin actions and pre-commit, add parity job, parallel tests— SHA-pin every third-partyuses:(tag preserved as trailing comment);pre-commit autoupdate --freezefor the three hook repos; newpre-commitCI job usingpre-commit/action@<sha>to close local↔CI parity for trailing-whitespace, end-of-file-fixer, check-{yaml,toml,json}, check-added-large-files, check-merge-conflict, ruff, and gitleaks; pinGITLEAKS_VERSION: 8.30.0in thesecret-scanjob to match the pre-commit engine;cache: pip+cache-dependency-path;concurrency.cancel-in-progressscoped to PRs only;testjob becomes a matrix of[skills/nelson/scripts, hooks, scripts]withfail-fast: false.docs: soften brownfield wording, trim AGENTS.md, document sensors— Rewrite the Brownfield complexity backlog paragraph in CLAUDE.md so it no longer creates a perverse incentive to avoid flagged functions; collapse the duplicated Maintainability sensors section in AGENTS.md to a single pointer at CLAUDE.md; addpyproject.toml,.pre-commit-config.yaml, and.github/workflows/ci.ymltodocs/project_structure.md; add a Local development section to CONTRIBUTING.md.What's in the lint config
max-argsmax-branchesmax-statementsmax-complexitymax-returnsline-lengthscripts/,hooks/,**/test_*.py,conftest.pyTriage outcome (brownfield)
Started at 183 violations after enabling the rules; ended at 0:
check=andstrict=kwargs, unused variables, redefined loop variables, three lazyimport jsoncalls moved to module top.# noqa: C901, PLR0912, PLR0915 -- ... refactor tracked in nelson-e6j). Brownfield-safe — the rules still fire if anything gets worse, and the suppressions point at a refactor backlog rather than disappearing into noise.SIM108(if/else→ternary often degrades readability),RUF002(Unicode in docstrings is intentional prose).ruff formatbaseline applied across 22 files.What's deferred (filed as follow-up issues)
nelson-8q8) — ≈82% of non-test functions annotated (178/217 across 12 files;scripts/count-tokens.pyhas 0/6), but strict mode would drown out the linter signal. Roll out incrementally.nelson-7rm) — coverage gate, import-linter contracts, Semgrep SAST.nelson-y2l) — each test dir has its ownconftest.pywith overlapping fixture names; CI runs them as a matrix for now.PLR0915/PLR0912blanket ignore in test files — surface and triage ~5-10 large-test refactor candidates (new follow-up to file post-merge).gitleaks-action@v2with the direct gitleaks binary — closes the fork-PR fail-open scenario whereGITLEAKS_LICENSEis missing (new follow-up to file post-merge).Test plan
ruff check— All checks passedruff format --check— 25 files already formattedpytest skills/nelson/scripts/— 353 passedpytest hooks/— 64 passedpytest scripts/— 21 passedpre-commit run --all-files— all hooks passruff check --select PLR0913 .— exits 0 (sensor actually fires now)python-lint,pre-commit, and matrixtestjobs alongside existingsecret-scan,markdown,yaml,links,spelling,references)pre-commit installonce locally so hooks fire on every commit