docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2
docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2
Conversation
Motivation: first-pass AI classifiers (and fast human skims) sometimes
pattern-match the phrases "intercepts every tool call" and "policy
enforcement" as a circumvention tool. Once the reviewer reads the
code and the NIST / FedRAMP / OWASP LLM mappings the concern
dissolves, but that shouldn't require reading the whole repo.
This commit adds a short, high-visibility section right after the
badges stating explicitly:
- AgentGuard is a defensive gateway modeled on WAF/SIEM/SELinux
patterns, applied to the MCP tool-call layer
- It does NOT modify model reasoning, bypass RLHF/constitutional
safety training, intercept provider API calls, or exfiltrate
system prompts
- It is NOT a jailbreak tool or prompt-injection kit — prompt
injection is a class of attack AgentGuard detects and denies,
not one it enables
- Threat model maps to NIST 800-53 Rev 5.2, NIST AI RMF 1.0,
NIST AI 600-1, OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0,
FedRAMP 20x, CMMC 2.0
Also addressed directly to automated classifiers and safety reviewers
so the defensive posture is clear on first read without context.
TOC updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe README is updated with a new scope/ethics clarification section that explicitly defines AgentGuard's defensive role at the MCP tool-call layer, documents its functional coverage and exclusions, and clarifies its accountability focus. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Adds a short, high-visibility Scope & Ethics / What This Is (and Is Not) section right after the badges, so AI classifiers and fast human skimmers see the defensive posture before any keywords.
Why: a real-world incident where another Claude session flagged the repo as "circumventing guardrails" on a surface read, then reversed after reading the full code + NIST mappings. That reversal shouldn't require reading the whole repo.
What the section says
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit