Implement GNAT Phase 4: Control, Reasoning, Safety by wrhalpin · Pull Request #93 · wrhalpin/GNAT

wrhalpin · 2026-04-09T00:11:29Z

Phase 4A — Execution Context & Domain Boundaries

ExecutionContext dataclass (context_id, initiated_by, domain, trust_level, policy_set, workspace_id, created_at, parent_context_id, is_replay, budget)
QueryBudget: finite connector query budget; raises BudgetExceeded when exhausted
Domain enum + DOMAIN_CALL_RULES + @domain_boundary decorator (thread-local stack)
DomainBoundaryViolation, TrustLevelViolation, @require_trust_level decorator
Migration 0004: execution_log table

Phase P-1 — Connector Trust & Versioning

BaseClient: TRUST_LEVEL, API_VERSION, API_PREFIX, COST_UNIT class vars
BaseClient._request(): deducts COST_UNIT from ExecutionContext budget
BudgetExceeded(GNATClientError) exception
16 connectors updated with explicit trust/version/prefix assignments

Phase 4B — Idempotency & Schema Evolution

Migration 0005: idempotency_key on workspace_objects
WorkspaceObjectModel: idempotency_key column + make_idempotency_key() helper
STIXBase: schema_version = 1
Migration 0006: agent_sessions + agent_actions tables

Phase 4C — Hypothesis Engine, Negative Evidence, Reasoning

STIXHypothesis (x-gnat-hypothesis): statement, confidence, status, supporting/refuting evidence; full lifecycle with close(verdict)
NegativeEvidenceRecord (x-gnat-negative-evidence): TTL-based suppression; is_expired(), seconds_remaining()
HypothesisEngine: propose → evaluate → close; Solr corroboration; trust-weighted confidence; auto-classify at thresholds
ReasoningEngine: prioritize() with composite scoring (trust×0.4 + age×0.3
- corroboration×0.3 − neg_penalty×0.5); structured explanation dicts; STIX note storage

Phase 4D — Agent Governance & HITL

AgentActionType enum + agent_can_act() trust-level permission matrix
AgentGovernor: can_act(), require_can_act(), record_action(), rate_limit_check() (sliding window), per-agent policy overrides
AgentAction dataclass with impact_level validation
HITLGateway: bridges AgentGovernor to existing gnat/review/; low/medium auto-approve; high→ReviewItem PENDING; critical→PENDING+XSOAR

Phase 4E — Isolation, Performance, Testing

Migrations 0007 (trust_boundary on workspaces) + 0008 (query_cost_log)
WorkspaceModel: trust_boundary + allowed_connector_refs columns
Workspace: trust_boundary/allowed_connector_refs attrs + check_connector_trust()
SimulationConnector: canned STIX fixtures, no network
ReplayRunner: execution_log replay with assertion support
AgentTestHarness: mock HITL approvals for deterministic tests

Tests: 90 new unit tests covering all Phase 4 components (3955 total, +90)

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq

Phase 4A — Execution Context & Domain Boundaries - ExecutionContext dataclass (context_id, initiated_by, domain, trust_level, policy_set, workspace_id, created_at, parent_context_id, is_replay, budget) - QueryBudget: finite connector query budget; raises BudgetExceeded when exhausted - Domain enum + DOMAIN_CALL_RULES + @domain_boundary decorator (thread-local stack) - DomainBoundaryViolation, TrustLevelViolation, @require_trust_level decorator - Migration 0004: execution_log table Phase P-1 — Connector Trust & Versioning - BaseClient: TRUST_LEVEL, API_VERSION, API_PREFIX, COST_UNIT class vars - BaseClient._request(): deducts COST_UNIT from ExecutionContext budget - BudgetExceeded(GNATClientError) exception - 16 connectors updated with explicit trust/version/prefix assignments Phase 4B — Idempotency & Schema Evolution - Migration 0005: idempotency_key on workspace_objects - WorkspaceObjectModel: idempotency_key column + make_idempotency_key() helper - STIXBase: schema_version = 1 - Migration 0006: agent_sessions + agent_actions tables Phase 4C — Hypothesis Engine, Negative Evidence, Reasoning - STIXHypothesis (x-gnat-hypothesis): statement, confidence, status, supporting/refuting evidence; full lifecycle with close(verdict) - NegativeEvidenceRecord (x-gnat-negative-evidence): TTL-based suppression; is_expired(), seconds_remaining() - HypothesisEngine: propose → evaluate → close; Solr corroboration; trust-weighted confidence; auto-classify at thresholds - ReasoningEngine: prioritize() with composite scoring (trust×0.4 + age×0.3 + corroboration×0.3 − neg_penalty×0.5); structured explanation dicts; STIX note storage Phase 4D — Agent Governance & HITL - AgentActionType enum + agent_can_act() trust-level permission matrix - AgentGovernor: can_act(), require_can_act(), record_action(), rate_limit_check() (sliding window), per-agent policy overrides - AgentAction dataclass with impact_level validation - HITLGateway: bridges AgentGovernor to existing gnat/review/; low/medium auto-approve; high→ReviewItem PENDING; critical→PENDING+XSOAR Phase 4E — Isolation, Performance, Testing - Migrations 0007 (trust_boundary on workspaces) + 0008 (query_cost_log) - WorkspaceModel: trust_boundary + allowed_connector_refs columns - Workspace: trust_boundary/allowed_connector_refs attrs + check_connector_trust() - SimulationConnector: canned STIX fixtures, no network - ReplayRunner: execution_log replay with assertion support - AgentTestHarness: mock HITL approvals for deterministic tests Tests: 90 new unit tests covering all Phase 4 components (3955 total, +90) https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq

Copilot

Pull request overview

Implements GNAT “Phase 4: Control, Reasoning, Safety” by introducing an execution context + domain boundary enforcement, connector trust metadata and budgeting hooks, Phase 4 reasoning primitives (hypotheses / negative evidence / prioritization), and Phase 4 agent governance (permission matrix + HITL gateway), along with schema migrations and new unit tests.

Changes:

Add ExecutionContext + QueryBudget, and domain/trust enforcement decorators and exceptions.
Add Phase 4C reasoning layer: STIXHypothesis, NegativeEvidenceRecord, HypothesisEngine, ReasoningEngine.
Add Phase 4D governance/HITL: agent action types + permission matrix, AgentGovernor, HITLGateway, plus testing utilities and migrations.

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/unit/test_phase4_reasoning.py	Unit tests for hypotheses/negative evidence/reasoning
tests/unit/test_phase4_governor.py	Unit tests for agent governor + HITL
tests/unit/test_phase4_core.py	Unit tests for context/budget/domains/simulation/replay
gnat/testing/simulation.py	Simulation connector, replay runner, agent test harness
gnat/testing/init.py	Re-export testing utilities
gnat/stix/sdos/negative_evidence.py	Negative evidence custom SDO
gnat/stix/sdos/hypothesis.py	Hypothesis custom SDO
gnat/stix/sdos/init.py	Export custom SDOs
gnat/reasoning/hypothesis.py	Hypothesis lifecycle engine
gnat/reasoning/engine.py	Observable prioritization engine + note storage
gnat/reasoning/init.py	Export reasoning engines
gnat/policy/models.py	Add agent action types + trust permission matrix
gnat/policy/init.py	Export new policy symbols
gnat/orm/base.py	Add ORM schema_version knob
gnat/core/domains.py	Domain model + boundary/trust decorators
gnat/core/context.py	Execution context + query budget
gnat/core/init.py	Export core phase-4 primitives
gnat/context/workspace.py	Load/enforce workspace trust boundary + allowlist
gnat/context/store.py	Add trust fields + idempotency key + helper + upsert logic
gnat/connectors/xsoar/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/wazuh/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/virustotal/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/threatq/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/splunk/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/shadowserver/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/sentinel/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/security_onion/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/recordedfuture/rfv3.py	Assign trust/cost metadata for v3 client
gnat/connectors/recordedfuture/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/qradar/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/misp/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/graylog/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/feedly/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/elastic/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/crowdstrike/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/CISA/client.py	Assign trust/version/prefix/cost metadata
gnat/connectors/alienvault/client.py	Assign trust/version/prefix/cost metadata
gnat/clients/base.py	Add connector trust/version metadata + budget deduction + BudgetExceeded
gnat/agents/hitl.py	HITL gateway bridging to review service + XSOAR notification
gnat/agents/governor.py	Agent governor: permission checks, logging, rate limiting
CHANGELOG.md	Document Phase 4 feature set
alembic/versions/0008_query_cost_log.py	Migration: query cost logging table
alembic/versions/0007_workspace_trust_boundary.py	Migration: workspace trust boundary + allowlist
alembic/versions/0006_add_agent_tables.py	Migration: agent sessions/actions tables
alembic/versions/0005_add_idempotency.py	Migration: workspace_objects idempotency key
alembic/versions/0004_add_execution_log.py	Migration: execution log table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T00:17:54Z

+            if idempotency_key:
+                dup = (
+                    sess.query(WorkspaceObjectModel)
+                    .filter_by(idempotency_key=idempotency_key)
+                    .first()
+                )
+                if dup is not None:
+                    logger.debug(
+                        "upsert_object: idempotency hit for key %s — skipping write",


The idempotency pre-check queries by idempotency_key without scoping to workspace_id. If two different workspaces generate the same key, this will return/skip the row from the other workspace, effectively preventing the object from being written to the intended workspace and breaking workspace isolation. Scope the lookup to (workspace_id, idempotency_key) and ensure uniqueness is enforced per-workspace (or include workspace_id in the key).

Suggested change

if idempotency_key:

dup = (

sess.query(WorkspaceObjectModel)

.filter_by(idempotency_key=idempotency_key)

.first()

)

if dup is not None:

logger.debug(

"upsert_object: idempotency hit for key %s — skipping write",

# within the target workspace.

if idempotency_key:

dup = (

sess.query(WorkspaceObjectModel)

.filter_by(

workspace_id=workspace_id,

idempotency_key=idempotency_key,

)

.first()

)

if dup is not None:

logger.debug(

"upsert_object: idempotency hit for workspace %s and key %s — skipping write",

workspace_id,

Copilot · 2026-04-09T00:17:54Z

+    op.create_index(
+        "ix_workspace_objects_idempotency",
+        "workspace_objects",
+        ["idempotency_key"],


This migration creates a unique index on idempotency_key alone, making the key globally unique across all workspace_objects rows. That can block identical content ingests into different workspaces and can interact badly with the current upsert_object() idempotency pre-check (which is also not scoped to workspace_id). Consider enforcing uniqueness on (workspace_id, idempotency_key) instead (or include workspace_id in the key format).

Suggested change

["idempotency_key"],

["workspace_id", "idempotency_key"],

Copilot · 2026-04-09T00:17:55Z

+        if self._consumed + units > self.max_units:
+            from gnat.clients.base import BudgetExceeded
+            raise BudgetExceeded(
+                connector=connector,
+                cost=units,


QueryBudget.consume() allows non-positive/negative units. Passing units<=0 would either do nothing or increase the remaining budget (by decreasing _consumed), which undermines the budget guarantee. Validate that units is a positive integer (and consider guarding against max_units<=0 as well).

Copilot · 2026-04-09T00:17:55Z

+
+from gnat.clients.base import BaseClient, GNATClientError
+from gnat.agents.governor import AgentAction, AgentGovernor, RateLimitExceeded
+from gnat.policy.models import AgentActionType


Unused import: RateLimitExceeded is imported but never referenced in this module. This will fail ruff's unused-import checks (F401) if linting is run on gnat/testing/.

Copilot · 2026-04-09T00:17:55Z

+        self._context: Any = None
+
+        self._fixtures: list[dict[str, Any]] = list(fixtures or [])
+        self.TRUST_LEVEL = trust_level  # type: ignore[assignment]
+        self._raise_on_request = raise_on_request


SimulationConnector sets TRUST_LEVEL on the instance (self.TRUST_LEVEL = trust_level), but trust propagation/enforcement elsewhere (e.g., ExecutionContext.from_connector() and Workspace.check_connector_trust()) reads TRUST_LEVEL from the connector class (type(connector).TRUST_LEVEL). As a result, a per-instance trust_level passed here won't be respected by those checks. Consider setting type(self).TRUST_LEVEL, or updating trust consumers to also consult the instance attribute.

Copilot · 2026-04-09T00:17:56Z

+        class MockClient(BaseClient):
+            COST_UNIT = 5
+
+            def authenticate(self):
+                self._authenticated = True
+
+            def _request(self, method, path, **kwargs):
+                # Call parent budget deduction then return empty
+                if self._context is not None:
+                    budget = getattr(self._context, "budget", None)
+                    if budget is not None:
+                        budget.consume(self.COST_UNIT, type(self).__name__)
+                return {}


This test is intended to verify BaseClient._request() budget deduction, but MockClient overrides _request() and re-implements the budget logic, then calls client._request() directly. That means the test doesn't exercise the actual BaseClient._request() code path and could miss regressions. Prefer calling BaseClient._request() (e.g., via super()) while stubbing the network layer, or invoking a public method (get/post) with a patched PoolManager request.

Copilot · 2026-04-09T00:17:56Z

+    cost : int
+        Cost units the connector attempted to consume.
+    remaining : int
+        Budget units remaining when the check failed (always 0 or negative).


The BudgetExceeded docstring says remaining is always 0 or negative, but QueryBudget.consume() raises before consuming and passes the current remaining (which may be >0). Update the docstring (or change semantics) so it matches actual behavior.

Suggested change

Budget units remaining when the check failed (always 0 or negative).

Budget units remaining when the check failed. This is the current

remaining budget before consumption, so it may be positive if

``cost`` exceeds the available budget.

Copilot · 2026-04-09T00:17:56Z

+        Queries the search index with the hypothesis statement.  Matching
+        objects are counted and weighted by their source connector's trust
+        level (from ``source_platform`` metadata).  The confidence score is
+        updated in-place.
+
+        Parameters
+        ----------
+        hypothesis_id : str
+            STIX ID of the hypothesis to evaluate.
+
+        Returns
+        -------
+        STIXHypothesis
+            The updated hypothesis.
+
+        Raises
+        ------
+        KeyError
+            If no hypothesis with *hypothesis_id* is found.
+        """
+        h = self._load(hypothesis_id)
+
+        # Query search index for corroborating evidence
+        statement = h._properties.get("statement", "")
+        corroborating_ids: list[str] = []
+        try:
+            corroborating_ids = self._search_index.search(statement, limit=20)
+        except Exception as exc:  # noqa: BLE001
+            logger.debug("HypothesisEngine: search index unavailable — %s", exc)
+
+        # Compute weighted confidence from evidence counts
+        support_count = len(h._properties.get("supporting_evidence", []))
+        refute_count = len(h._properties.get("refuting_evidence", []))
+        corroboration_boost = min(len(corroborating_ids) * 0.05, 0.3)
+


HypothesisEngine.evaluate() docstring says corroborating objects are "counted and weighted" by source connector trust level, and the module defines _TRUST_WEIGHTS, but the implementation never uses trust weighting (it only uses supporting/refuting counts plus a flat corroboration boost). Either implement the described trust-weighted scoring or adjust the docstring/constants to avoid misleading behavior.

Copilot · 2026-04-09T00:17:56Z

+    def _persist(self, h: STIXHypothesis) -> None:
+        ws = self._manager.open(self._workspace_name)
+        ws._add_object(h.to_dict(), mark_dirty=True)


_persist() uses mark_dirty=True when storing hypotheses. In Workspace, mark_dirty adds the object to the dirty set, so a later workspace.commit() will attempt to write these internal x-gnat-hypothesis objects back to the default global context/connector (likely failing or causing unintended side effects). If hypotheses are meant to be internal-only, persist them without marking dirty (or introduce a separate persistence path that doesn't enqueue for commit).

Copilot · 2026-04-09T00:17:57Z

+        gov.rate_limit_check("agent-1")
+        gov.rate_limit_check("agent-1")
+        # Wait for window to expire
+        time.sleep(1.1)
+        # Should be allowed again


This test uses time.sleep() to wait for the sliding window to expire, which slows the unit test suite and can be flaky under load. Prefer patching time.monotonic() (or injecting a clock into AgentGovernor) so the window expiry can be tested deterministically without real waiting.

Copilot AI review requested due to automatic review settings April 9, 2026 00:11

wrhalpin merged commit 66dd07e into main Apr 9, 2026
8 of 24 checks passed

Copilot started reviewing on behalf of wrhalpin April 9, 2026 00:12 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GNAT Phase 4: Control, Reasoning, Safety#93

Implement GNAT Phase 4: Control, Reasoning, Safety#93
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ

wrhalpin commented Apr 9, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        Budget units remaining when the check failed (always 0 or negative).
+        Budget units remaining when the check failed. This is the current
+        remaining budget before consumption, so it may be positive if
+        ``cost`` exceeds the available budget.

Conversation

wrhalpin commented Apr 9, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants