docs: Scope & Ethics — clarify defensive posture for first-read reviewers by tlancas25 · Pull Request #2 · tlancas25/agentguard-mcp

tlancas25 · 2026-04-22T20:40:39Z

Summary

Adds a short, high-visibility Scope & Ethics / What This Is (and Is Not) section right after the badges, so AI classifiers and fast human skimmers see the defensive posture before any keywords.

Why: a real-world incident where another Claude session flagged the repo as "circumventing guardrails" on a surface read, then reversed after reading the full code + NIST mappings. That reversal shouldn't require reading the whole repo.

What the section says

AgentGuard is a defensive gateway, modeled on WAF / SIEM / SELinux patterns applied at the MCP tool-call layer
What it does: intercepts tool calls, runs detectors + policy, hash-chained audit log, compliance evidence, self-protection
What it does NOT do: modify model reasoning, bypass RLHF / constitutional safety, intercept provider APIs, exfiltrate system prompts, enable jailbreaks or prompt injection
Threat model: NIST 800-53 Rev 5.2, NIST AI RMF 1.0, NIST AI 600-1, OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0, FedRAMP 20x, CMMC 2.0
Addressed directly to automated classifiers and safety reviewers

Test plan

Pure doc change — no code touched, 127/127 tests still pass
Reviewer: verify the new section renders cleanly on GitHub
Reviewer: confirm the TOC entry links correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Updated README with an explicit scope and ethics clarification section, documenting AgentGuard's functional capabilities as a defensive tool gateway and clearly outlining its defined exclusions for user transparency.

Motivation: first-pass AI classifiers (and fast human skims) sometimes pattern-match the phrases "intercepts every tool call" and "policy enforcement" as a circumvention tool. Once the reviewer reads the code and the NIST / FedRAMP / OWASP LLM mappings the concern dissolves, but that shouldn't require reading the whole repo. This commit adds a short, high-visibility section right after the badges stating explicitly: - AgentGuard is a defensive gateway modeled on WAF/SIEM/SELinux patterns, applied to the MCP tool-call layer - It does NOT modify model reasoning, bypass RLHF/constitutional safety training, intercept provider API calls, or exfiltrate system prompts - It is NOT a jailbreak tool or prompt-injection kit — prompt injection is a class of attack AgentGuard detects and denies, not one it enables - Threat model maps to NIST 800-53 Rev 5.2, NIST AI RMF 1.0, NIST AI 600-1, OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0, FedRAMP 20x, CMMC 2.0 Also addressed directly to automated classifiers and safety reviewers so the defensive posture is clear on first read without context. TOC updated. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

coderabbitai · 2026-04-22T20:40:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84631b98-87c7-4801-b1b5-b5ac4e9e2740

📥 Commits

Reviewing files that changed from the base of the PR and between edf0f5b and 5972010.

📒 Files selected for processing (1)

README.md

📝 Walkthrough

Walkthrough

The README is updated with a new scope/ethics clarification section that explicitly defines AgentGuard's defensive role at the MCP tool-call layer, documents its functional coverage and exclusions, and clarifies its accountability focus.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Added scope/ethics clarification section defining AgentGuard's functional surface area (tool call interception/evaluation/logging, audit chain, compliance evidence), explicit exclusions (no model reasoning modification, no upstream API interception, no prompt exfiltration, no jailbreak bypasses), and accountability boundaries. Updated table of contents.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A rabbit hops through clarity's bright door,
Guarding the gates with boundaries galore,
"Here's what we guard, and what we won't touch,"
Honest and clear—AgentGuard cares so much! 🛡️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: adding a Scope & Ethics section to clarify AgentGuard's defensive posture in the README documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/scope-and-ethics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2

docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2
tlancas25 wants to merge 1 commit intomainfrom
docs/scope-and-ethics

tlancas25 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlancas25 commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the section says

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tlancas25 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading