Skip to content

docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2

Open
tlancas25 wants to merge 1 commit intomainfrom
docs/scope-and-ethics
Open

docs: Scope & Ethics — clarify defensive posture for first-read reviewers#2
tlancas25 wants to merge 1 commit intomainfrom
docs/scope-and-ethics

Conversation

@tlancas25
Copy link
Copy Markdown
Owner

@tlancas25 tlancas25 commented Apr 22, 2026

Summary

Adds a short, high-visibility Scope & Ethics / What This Is (and Is Not) section right after the badges, so AI classifiers and fast human skimmers see the defensive posture before any keywords.

Why: a real-world incident where another Claude session flagged the repo as "circumventing guardrails" on a surface read, then reversed after reading the full code + NIST mappings. That reversal shouldn't require reading the whole repo.

What the section says

  • AgentGuard is a defensive gateway, modeled on WAF / SIEM / SELinux patterns applied at the MCP tool-call layer
  • What it does: intercepts tool calls, runs detectors + policy, hash-chained audit log, compliance evidence, self-protection
  • What it does NOT do: modify model reasoning, bypass RLHF / constitutional safety, intercept provider APIs, exfiltrate system prompts, enable jailbreaks or prompt injection
  • Threat model: NIST 800-53 Rev 5.2, NIST AI RMF 1.0, NIST AI 600-1, OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0, FedRAMP 20x, CMMC 2.0
  • Addressed directly to automated classifiers and safety reviewers

Test plan

  • Pure doc change — no code touched, 127/127 tests still pass
  • Reviewer: verify the new section renders cleanly on GitHub
  • Reviewer: confirm the TOC entry links correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Updated README with an explicit scope and ethics clarification section, documenting AgentGuard's functional capabilities as a defensive tool gateway and clearly outlining its defined exclusions for user transparency.

Motivation: first-pass AI classifiers (and fast human skims) sometimes
pattern-match the phrases "intercepts every tool call" and "policy
enforcement" as a circumvention tool. Once the reviewer reads the
code and the NIST / FedRAMP / OWASP LLM mappings the concern
dissolves, but that shouldn't require reading the whole repo.

This commit adds a short, high-visibility section right after the
badges stating explicitly:

  - AgentGuard is a defensive gateway modeled on WAF/SIEM/SELinux
    patterns, applied to the MCP tool-call layer
  - It does NOT modify model reasoning, bypass RLHF/constitutional
    safety training, intercept provider API calls, or exfiltrate
    system prompts
  - It is NOT a jailbreak tool or prompt-injection kit — prompt
    injection is a class of attack AgentGuard detects and denies,
    not one it enables
  - Threat model maps to NIST 800-53 Rev 5.2, NIST AI RMF 1.0,
    NIST AI 600-1, OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0,
    FedRAMP 20x, CMMC 2.0

Also addressed directly to automated classifiers and safety reviewers
so the defensive posture is clear on first read without context.

TOC updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84631b98-87c7-4801-b1b5-b5ac4e9e2740

📥 Commits

Reviewing files that changed from the base of the PR and between edf0f5b and 5972010.

📒 Files selected for processing (1)
  • README.md

📝 Walkthrough

Walkthrough

The README is updated with a new scope/ethics clarification section that explicitly defines AgentGuard's defensive role at the MCP tool-call layer, documents its functional coverage and exclusions, and clarifies its accountability focus.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added scope/ethics clarification section defining AgentGuard's functional surface area (tool call interception/evaluation/logging, audit chain, compliance evidence), explicit exclusions (no model reasoning modification, no upstream API interception, no prompt exfiltration, no jailbreak bypasses), and accountability boundaries. Updated table of contents.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A rabbit hops through clarity's bright door,
Guarding the gates with boundaries galore,
"Here's what we guard, and what we won't touch,"
Honest and clear—AgentGuard cares so much! 🛡️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: adding a Scope & Ethics section to clarify AgentGuard's defensive posture in the README documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/scope-and-ethics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant