Skip to content

Add Phase 4 documentation: ADRs 0039-0049, how-to guides, Sphinx API …#94

Merged
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ
Apr 9, 2026
Merged

Add Phase 4 documentation: ADRs 0039-0049, how-to guides, Sphinx API …#94
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ

Conversation

@wrhalpin
Copy link
Copy Markdown
Owner

@wrhalpin wrhalpin commented Apr 9, 2026

…docs, diagrams

  • ADRs 0039-0049 covering all Phase 4 subsystems: ExecutionContext, connector trust model, idempotency/schema evolution, hypothesis engine, negative evidence, reasoning engine, agent governance, HITL gateway, workspace isolation, query budget, and the simulation/replay testing framework
  • Updated ADR README Table of Contents with entries 38-49 (including Phase 4 section)
  • Extended docs/explanation/architecture/diagrams.md with Phase 4 layer table rows and Control Layer Mermaid diagram; added ADR cross-links 0039-0049
  • Added 5 new Mermaid sequence/flowchart diagrams to workflow-diagrams.md covering ExecutionContext propagation, HypothesisEngine lifecycle, ReasoningEngine scoring, Agent Governance & HITL flow, and workspace trust boundary enforcement
  • Added [agent_policy], [connector_limits], [workspace_defaults], [execution_context] sections to docs/reference/configuration.md
  • Three new how-to guides: use-execution-context.md, use-reasoning-engine.md, agent-governance.md; added Phase 4 section to how-to/README.md
  • New Sphinx RST pages: reasoning.rst, agents_governance.rst, api/core.rst, api/reasoning.rst, api/agents_governance.rst; updated index.rst toctrees

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq

…docs, diagrams

- ADRs 0039-0049 covering all Phase 4 subsystems: ExecutionContext, connector
  trust model, idempotency/schema evolution, hypothesis engine, negative evidence,
  reasoning engine, agent governance, HITL gateway, workspace isolation, query
  budget, and the simulation/replay testing framework
- Updated ADR README Table of Contents with entries 38-49 (including Phase 4 section)
- Extended docs/explanation/architecture/diagrams.md with Phase 4 layer table rows
  and Control Layer Mermaid diagram; added ADR cross-links 0039-0049
- Added 5 new Mermaid sequence/flowchart diagrams to workflow-diagrams.md covering
  ExecutionContext propagation, HypothesisEngine lifecycle, ReasoningEngine scoring,
  Agent Governance & HITL flow, and workspace trust boundary enforcement
- Added [agent_policy], [connector_limits], [workspace_defaults], [execution_context]
  sections to docs/reference/configuration.md
- Three new how-to guides: use-execution-context.md, use-reasoning-engine.md,
  agent-governance.md; added Phase 4 section to how-to/README.md
- New Sphinx RST pages: reasoning.rst, agents_governance.rst, api/core.rst,
  api/reasoning.rst, api/agents_governance.rst; updated index.rst toctrees

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq
Copilot AI review requested due to automatic review settings April 9, 2026 00:35
@wrhalpin wrhalpin merged commit df3656e into main Apr 9, 2026
7 of 19 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase 4 (“control/safety”, reasoning, and agent governance) documentation across ADRs, how-to guides, config reference, and Sphinx pages so users can discover and use the new subsystems.

Changes:

  • Added ADRs 0039–0049 describing Phase 4 subsystems (ExecutionContext/trust/budget, reasoning/hypotheses/negative evidence, governance/HITL, workspace isolation, simulation/replay testing).
  • Added/updated user-facing docs: new how-to guides, updated configuration reference, and expanded architecture diagrams/workflows.
  • Extended Sphinx docs: new narrative pages plus new API reference entry points and updated toctrees.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 29 comments.

Show a summary per file
File Description
docs/explanation/architecture/adrs/0039-ADR-execution-context.md ExecutionContext ADR (factored factories, budget, persistence)
docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md Trust-level model ADR and its integration with ExecutionContext
docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md Idempotency/schema evolution ADR
docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md Hypothesis engine ADR (lifecycle + evaluation model)
docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md Negative-evidence ADR (TTL suppression model)
docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md Reasoning engine ADR (scoring + Solr corroboration + notes)
docs/explanation/architecture/adrs/0045-ADR-agent-governance.md Agent governance ADR (permission matrix, rate limits, audit log)
docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md HITL gateway ADR (review queue bridge + escalation)
docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md Workspace isolation ADR (trust boundaries + connector allowlists)
docs/explanation/architecture/adrs/0048-ADR-query-budget.md Query budget ADR (budget charging + exhaustion behavior)
docs/explanation/architecture/adrs/0049-ADR-testing-framework.md Simulation/replay/testing ADR (SimulationConnector/ReplayRunner/Harness)
docs/explanation/architecture/adrs/README.md ADR TOC updates for 0039–0049
docs/explanation/architecture/diagrams.md Expanded architecture diagrams to include Phase 4 layers
docs/explanation/architecture/workflow-diagrams.md Added Phase 4 workflow mermaid diagrams (context propagation, reasoning, governance, trust boundaries)
docs/reference/configuration.md Documented new Phase 4 configuration sections/keys
docs/how-to/use-execution-context.md How-to guide for creating/propagating ExecutionContext, boundaries, and budgets
docs/how-to/use-reasoning-engine.md How-to guide for ReasoningEngine/HypothesisEngine/NegativeEvidence + Solr integration
docs/how-to/agent-governance.md How-to guide for AgentGovernor/HITL + testing harness usage
docs/how-to/README.md Updated how-to index to include Phase 4 guides
docs/sphinx-html/source/index.rst Added new Sphinx toctree entries for Phase 4 docs + API pages
docs/sphinx-html/source/reasoning.rst New Sphinx narrative page for the reasoning layer
docs/sphinx-html/source/agents_governance.rst New Sphinx narrative page for agent governance/HITL
docs/sphinx-html/source/api/core.rst New Sphinx API reference entry point for gnat.core
docs/sphinx-html/source/api/reasoning.rst New Sphinx API reference entry point for gnat.reasoning
docs/sphinx-html/source/api/agents_governance.rst New Sphinx API reference entry point for governance/HITL/testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## 9. Hypothesis Engine Lifecycle (Phase 4C)

The full propose → evaluate → close lifecycle for `STIXHypothesis` objects, showing
how Solr corroboration and trust-weighted evidence feed into confidence updates.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow text says HypothesisEngine confidence updates use "trust-weighted evidence", but the current HypothesisEngine.evaluate() implementation does not apply trust weights. Consider rewording to just describe Solr corroboration (or update the implementation to weight by trust if that’s a requirement).

Suggested change
how Solr corroboration and trust-weighted evidence feed into confidence updates.
how Solr corroboration and supporting/refuting evidence feed into confidence updates.

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +203
The `AgentTestHarness` provides a fully deterministic test environment — all HITL
submissions are auto-approved and all rate limits are effectively unlimited:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AgentTestHarness does not auto-approve high/critical actions: HITLGateway.evaluate() returns (False, review_item) for impact_level high/critical even when using the mock review service. Update this text to clarify that the harness makes review submission deterministic (but still not auto-approved for high/critical) or update the harness/gateway behavior to match the claim.

Suggested change
The `AgentTestHarness` provides a fully deterministic test environment — all HITL
submissions are auto-approved and all rate limits are effectively unlimited:
The `AgentTestHarness` provides a fully deterministic test environment — review
submissions are handled deterministically by the mock HITL flow, low-impact actions
like the example below are auto-approved, and rate limits are effectively unlimited.
High- and critical-impact actions are still submitted for review rather than being
auto-approved:

Copilot uses AI. Check for mistakes.
print(f"{neg['count']} fresh neg records, penalty={neg['penalty']:.2f}")

corr = explanation["components"]["corroboration"]
print(f"{corr['hits']} Solr hits, bonus={corr['bonus']:.2f}")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corroboration component in ReasoningEngine’s explanation dict reports a derived/capped hits count (based on bonus/0.05), not the raw Solr hit count. This example prints it as "Solr hits", which is misleading once the bonus cap is reached. Consider renaming the field in docs (e.g. "effective_hits") or updating the engine to record the actual hit count separately.

Suggested change
print(f"{corr['hits']} Solr hits, bonus={corr['bonus']:.2f}")
print(f"{corr['hits']} effective hits, bonus={corr['bonus']:.2f}")

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +22
* :class:`~gnat.agents.hitl.HITLGateway` — bridges ``AgentGovernor`` to the existing
:class:`~gnat.review.service.ReviewService`; low/medium-impact actions are
auto-approved, high-impact actions block until a human reviewer approves, and
critical actions also trigger XSOAR notifications.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overview says high-impact actions "block until a human reviewer approves", but HITLGateway.evaluate() currently returns (False, review_item) for high/critical and does not block; callers must poll check_approval_status() and decide when to execute. Please adjust the wording to reflect the non-blocking API (or implement a blocking helper if that’s intended).

Copilot uses AI. Check for mistakes.
:undoc-members:

ExecutionContext
---------------
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underline for the "ExecutionContext" section title is one character shorter than the title text, which can trigger docutils/Sphinx warnings ("Title underline too short"). Extend the underline so it is at least the same length as the title.

Suggested change
---------------
----------------

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +205
The mapping is defined in `gnat/core/trust.py` as `TRUST_WEIGHTS: dict[str, float]`
and shared between `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine`
to ensure a single source of truth.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This claims the trust-weight mapping lives in gnat/core/trust.py as TRUST_WEIGHTS, but there is no gnat.core.trust module in the codebase and the trust→weight mapping is currently defined inside gnat.reasoning.engine (_TRUST_WEIGHTS). Update this reference so readers can find the real source of truth.

Suggested change
The mapping is defined in `gnat/core/trust.py` as `TRUST_WEIGHTS: dict[str, float]`
and shared between `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine`
to ensure a single source of truth.
The current implementation defines this mapping in `gnat.reasoning.engine`
as `_TRUST_WEIGHTS: dict[str, float]`. This ADR should reference that module as
the current source of truth for the trust→weight mapping.

Copilot uses AI. Check for mistakes.
class NullSearchIndex(SearchIndex):
"""No-op search index used when Solr is unavailable."""

def query(self, query: str, fields: list[str] | None = None) -> list[dict]:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This NullSearchIndex example defines a query() method, but the actual search interface is SearchIndex.search(...) (and NullSearchIndex implements search, not query). Update this ADR snippet to match the real gnat.search.index.SearchIndex API.

Suggested change
def query(self, query: str, fields: list[str] | None = None) -> list[dict]:
def search(self, query: str, fields: list[str] | None = None) -> list[dict]:

Copilot uses AI. Check for mistakes.
Comment on lines +316 to +320
```python
from gnat.reasoning.engine import ReasoningEngine
from gnat.search import GNATIndexer
from gnat.core.context import ExecutionContext

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gnat.search.GNATIndexer is referenced here, but there is no GNATIndexer symbol in gnat.search. The existing Solr integration uses SolrSearchIndex / SearchIndex and the build_search_index() factory. Update this example to use the actual search classes.

Copilot uses AI. Check for mistakes.
Comment on lines +328 to +330
store=workspace_store,
search_index=GNATIndexer.from_config(config),
neg_store=neg_evidence_store,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ReasoningEngine construction uses store=..., neg_store=... parameters, but the implemented gnat.reasoning.engine.ReasoningEngine constructor takes manager + workspace_name (and gathers negative evidence from the workspace). Update the ADR usage example to match the current engine signature.

Suggested change
store=workspace_store,
search_index=GNATIndexer.from_config(config),
neg_store=neg_evidence_store,
manager=workspace_manager,
workspace_name="acme-corp",
search_index=GNATIndexer.from_config(config),

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +120
trust_weights: dict[str, float] | None = None,
) -> None:
self._store = store
self._search = search_index
self._weights = trust_weights or TRUST_WEIGHTS # from gnat.core.trust
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet references TRUST_WEIGHTS "from gnat.core.trust", but there is no gnat.core.trust module in the codebase and the implemented HypothesisEngine constructor does not accept trust_weights. Update the snippet to match the real constructor and where trust weights are actually defined/used.

Suggested change
trust_weights: dict[str, float] | None = None,
) -> None:
self._store = store
self._search = search_index
self._weights = trust_weights or TRUST_WEIGHTS # from gnat.core.trust
) -> None:
self._store = store
self._search = search_index
# Trust weights are not injected via the constructor; they are
# applied by the evaluation/scoring logic where they are defined.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants