Skip to content

Implement GNAT Phase 4: Control, Reasoning, Safety#93

Merged
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ
Apr 9, 2026
Merged

Implement GNAT Phase 4: Control, Reasoning, Safety#93
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ

Conversation

@wrhalpin
Copy link
Copy Markdown
Owner

@wrhalpin wrhalpin commented Apr 9, 2026

Phase 4A — Execution Context & Domain Boundaries

  • ExecutionContext dataclass (context_id, initiated_by, domain, trust_level, policy_set, workspace_id, created_at, parent_context_id, is_replay, budget)
  • QueryBudget: finite connector query budget; raises BudgetExceeded when exhausted
  • Domain enum + DOMAIN_CALL_RULES + @domain_boundary decorator (thread-local stack)
  • DomainBoundaryViolation, TrustLevelViolation, @require_trust_level decorator
  • Migration 0004: execution_log table

Phase P-1 — Connector Trust & Versioning

  • BaseClient: TRUST_LEVEL, API_VERSION, API_PREFIX, COST_UNIT class vars
  • BaseClient._request(): deducts COST_UNIT from ExecutionContext budget
  • BudgetExceeded(GNATClientError) exception
  • 16 connectors updated with explicit trust/version/prefix assignments

Phase 4B — Idempotency & Schema Evolution

  • Migration 0005: idempotency_key on workspace_objects
  • WorkspaceObjectModel: idempotency_key column + make_idempotency_key() helper
  • STIXBase: schema_version = 1
  • Migration 0006: agent_sessions + agent_actions tables

Phase 4C — Hypothesis Engine, Negative Evidence, Reasoning

  • STIXHypothesis (x-gnat-hypothesis): statement, confidence, status, supporting/refuting evidence; full lifecycle with close(verdict)
  • NegativeEvidenceRecord (x-gnat-negative-evidence): TTL-based suppression; is_expired(), seconds_remaining()
  • HypothesisEngine: propose → evaluate → close; Solr corroboration; trust-weighted confidence; auto-classify at thresholds
  • ReasoningEngine: prioritize() with composite scoring (trust×0.4 + age×0.3
    • corroboration×0.3 − neg_penalty×0.5); structured explanation dicts; STIX note storage

Phase 4D — Agent Governance & HITL

  • AgentActionType enum + agent_can_act() trust-level permission matrix
  • AgentGovernor: can_act(), require_can_act(), record_action(), rate_limit_check() (sliding window), per-agent policy overrides
  • AgentAction dataclass with impact_level validation
  • HITLGateway: bridges AgentGovernor to existing gnat/review/; low/medium auto-approve; high→ReviewItem PENDING; critical→PENDING+XSOAR

Phase 4E — Isolation, Performance, Testing

  • Migrations 0007 (trust_boundary on workspaces) + 0008 (query_cost_log)
  • WorkspaceModel: trust_boundary + allowed_connector_refs columns
  • Workspace: trust_boundary/allowed_connector_refs attrs + check_connector_trust()
  • SimulationConnector: canned STIX fixtures, no network
  • ReplayRunner: execution_log replay with assertion support
  • AgentTestHarness: mock HITL approvals for deterministic tests

Tests: 90 new unit tests covering all Phase 4 components (3955 total, +90)

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq

Phase 4A — Execution Context & Domain Boundaries
- ExecutionContext dataclass (context_id, initiated_by, domain, trust_level,
  policy_set, workspace_id, created_at, parent_context_id, is_replay, budget)
- QueryBudget: finite connector query budget; raises BudgetExceeded when exhausted
- Domain enum + DOMAIN_CALL_RULES + @domain_boundary decorator (thread-local stack)
- DomainBoundaryViolation, TrustLevelViolation, @require_trust_level decorator
- Migration 0004: execution_log table

Phase P-1 — Connector Trust & Versioning
- BaseClient: TRUST_LEVEL, API_VERSION, API_PREFIX, COST_UNIT class vars
- BaseClient._request(): deducts COST_UNIT from ExecutionContext budget
- BudgetExceeded(GNATClientError) exception
- 16 connectors updated with explicit trust/version/prefix assignments

Phase 4B — Idempotency & Schema Evolution
- Migration 0005: idempotency_key on workspace_objects
- WorkspaceObjectModel: idempotency_key column + make_idempotency_key() helper
- STIXBase: schema_version = 1
- Migration 0006: agent_sessions + agent_actions tables

Phase 4C — Hypothesis Engine, Negative Evidence, Reasoning
- STIXHypothesis (x-gnat-hypothesis): statement, confidence, status,
  supporting/refuting evidence; full lifecycle with close(verdict)
- NegativeEvidenceRecord (x-gnat-negative-evidence): TTL-based suppression;
  is_expired(), seconds_remaining()
- HypothesisEngine: propose → evaluate → close; Solr corroboration;
  trust-weighted confidence; auto-classify at thresholds
- ReasoningEngine: prioritize() with composite scoring (trust×0.4 + age×0.3
  + corroboration×0.3 − neg_penalty×0.5); structured explanation dicts;
  STIX note storage

Phase 4D — Agent Governance & HITL
- AgentActionType enum + agent_can_act() trust-level permission matrix
- AgentGovernor: can_act(), require_can_act(), record_action(),
  rate_limit_check() (sliding window), per-agent policy overrides
- AgentAction dataclass with impact_level validation
- HITLGateway: bridges AgentGovernor to existing gnat/review/;
  low/medium auto-approve; high→ReviewItem PENDING; critical→PENDING+XSOAR

Phase 4E — Isolation, Performance, Testing
- Migrations 0007 (trust_boundary on workspaces) + 0008 (query_cost_log)
- WorkspaceModel: trust_boundary + allowed_connector_refs columns
- Workspace: trust_boundary/allowed_connector_refs attrs + check_connector_trust()
- SimulationConnector: canned STIX fixtures, no network
- ReplayRunner: execution_log replay with assertion support
- AgentTestHarness: mock HITL approvals for deterministic tests

Tests: 90 new unit tests covering all Phase 4 components (3955 total, +90)

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq
Copilot AI review requested due to automatic review settings April 9, 2026 00:11
@wrhalpin wrhalpin merged commit 66dd07e into main Apr 9, 2026
8 of 24 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements GNAT “Phase 4: Control, Reasoning, Safety” by introducing an execution context + domain boundary enforcement, connector trust metadata and budgeting hooks, Phase 4 reasoning primitives (hypotheses / negative evidence / prioritization), and Phase 4 agent governance (permission matrix + HITL gateway), along with schema migrations and new unit tests.

Changes:

  • Add ExecutionContext + QueryBudget, and domain/trust enforcement decorators and exceptions.
  • Add Phase 4C reasoning layer: STIXHypothesis, NegativeEvidenceRecord, HypothesisEngine, ReasoningEngine.
  • Add Phase 4D governance/HITL: agent action types + permission matrix, AgentGovernor, HITLGateway, plus testing utilities and migrations.

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/unit/test_phase4_reasoning.py Unit tests for hypotheses/negative evidence/reasoning
tests/unit/test_phase4_governor.py Unit tests for agent governor + HITL
tests/unit/test_phase4_core.py Unit tests for context/budget/domains/simulation/replay
gnat/testing/simulation.py Simulation connector, replay runner, agent test harness
gnat/testing/init.py Re-export testing utilities
gnat/stix/sdos/negative_evidence.py Negative evidence custom SDO
gnat/stix/sdos/hypothesis.py Hypothesis custom SDO
gnat/stix/sdos/init.py Export custom SDOs
gnat/reasoning/hypothesis.py Hypothesis lifecycle engine
gnat/reasoning/engine.py Observable prioritization engine + note storage
gnat/reasoning/init.py Export reasoning engines
gnat/policy/models.py Add agent action types + trust permission matrix
gnat/policy/init.py Export new policy symbols
gnat/orm/base.py Add ORM schema_version knob
gnat/core/domains.py Domain model + boundary/trust decorators
gnat/core/context.py Execution context + query budget
gnat/core/init.py Export core phase-4 primitives
gnat/context/workspace.py Load/enforce workspace trust boundary + allowlist
gnat/context/store.py Add trust fields + idempotency key + helper + upsert logic
gnat/connectors/xsoar/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/wazuh/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/virustotal/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/threatq/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/splunk/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/shadowserver/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/sentinel/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/security_onion/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/recordedfuture/rfv3.py Assign trust/cost metadata for v3 client
gnat/connectors/recordedfuture/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/qradar/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/misp/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/graylog/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/feedly/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/elastic/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/crowdstrike/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/CISA/client.py Assign trust/version/prefix/cost metadata
gnat/connectors/alienvault/client.py Assign trust/version/prefix/cost metadata
gnat/clients/base.py Add connector trust/version metadata + budget deduction + BudgetExceeded
gnat/agents/hitl.py HITL gateway bridging to review service + XSOAR notification
gnat/agents/governor.py Agent governor: permission checks, logging, rate limiting
CHANGELOG.md Document Phase 4 feature set
alembic/versions/0008_query_cost_log.py Migration: query cost logging table
alembic/versions/0007_workspace_trust_boundary.py Migration: workspace trust boundary + allowlist
alembic/versions/0006_add_agent_tables.py Migration: agent sessions/actions tables
alembic/versions/0005_add_idempotency.py Migration: workspace_objects idempotency key
alembic/versions/0004_add_execution_log.py Migration: execution log table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gnat/context/store.py
Comment on lines +447 to +455
if idempotency_key:
dup = (
sess.query(WorkspaceObjectModel)
.filter_by(idempotency_key=idempotency_key)
.first()
)
if dup is not None:
logger.debug(
"upsert_object: idempotency hit for key %s — skipping write",
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idempotency pre-check queries by idempotency_key without scoping to workspace_id. If two different workspaces generate the same key, this will return/skip the row from the other workspace, effectively preventing the object from being written to the intended workspace and breaking workspace isolation. Scope the lookup to (workspace_id, idempotency_key) and ensure uniqueness is enforced per-workspace (or include workspace_id in the key).

Suggested change
if idempotency_key:
dup = (
sess.query(WorkspaceObjectModel)
.filter_by(idempotency_key=idempotency_key)
.first()
)
if dup is not None:
logger.debug(
"upsert_object: idempotency hit for key %s — skipping write",
# within the target workspace.
if idempotency_key:
dup = (
sess.query(WorkspaceObjectModel)
.filter_by(
workspace_id=workspace_id,
idempotency_key=idempotency_key,
)
.first()
)
if dup is not None:
logger.debug(
"upsert_object: idempotency hit for workspace %s and key %s — skipping write",
workspace_id,

Copilot uses AI. Check for mistakes.
op.create_index(
"ix_workspace_objects_idempotency",
"workspace_objects",
["idempotency_key"],
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration creates a unique index on idempotency_key alone, making the key globally unique across all workspace_objects rows. That can block identical content ingests into different workspaces and can interact badly with the current upsert_object() idempotency pre-check (which is also not scoped to workspace_id). Consider enforcing uniqueness on (workspace_id, idempotency_key) instead (or include workspace_id in the key format).

Suggested change
["idempotency_key"],
["workspace_id", "idempotency_key"],

Copilot uses AI. Check for mistakes.
Comment thread gnat/core/context.py
Comment on lines +88 to +92
if self._consumed + units > self.max_units:
from gnat.clients.base import BudgetExceeded
raise BudgetExceeded(
connector=connector,
cost=units,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryBudget.consume() allows non-positive/negative units. Passing units<=0 would either do nothing or increase the remaining budget (by decreasing _consumed), which undermines the budget guarantee. Validate that units is a positive integer (and consider guarding against max_units<=0 as well).

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +32

from gnat.clients.base import BaseClient, GNATClientError
from gnat.agents.governor import AgentAction, AgentGovernor, RateLimitExceeded
from gnat.policy.models import AgentActionType
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: RateLimitExceeded is imported but never referenced in this module. This will fail ruff's unused-import checks (F401) if linting is run on gnat/testing/.

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +96
self._context: Any = None

self._fixtures: list[dict[str, Any]] = list(fixtures or [])
self.TRUST_LEVEL = trust_level # type: ignore[assignment]
self._raise_on_request = raise_on_request
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimulationConnector sets TRUST_LEVEL on the instance (self.TRUST_LEVEL = trust_level), but trust propagation/enforcement elsewhere (e.g., ExecutionContext.from_connector() and Workspace.check_connector_trust()) reads TRUST_LEVEL from the connector class (type(connector).TRUST_LEVEL). As a result, a per-instance trust_level passed here won't be respected by those checks. Consider setting type(self).TRUST_LEVEL, or updating trust consumers to also consult the instance attribute.

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +154
class MockClient(BaseClient):
COST_UNIT = 5

def authenticate(self):
self._authenticated = True

def _request(self, method, path, **kwargs):
# Call parent budget deduction then return empty
if self._context is not None:
budget = getattr(self._context, "budget", None)
if budget is not None:
budget.consume(self.COST_UNIT, type(self).__name__)
return {}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intended to verify BaseClient._request() budget deduction, but MockClient overrides _request() and re-implements the budget logic, then calls client._request() directly. That means the test doesn't exercise the actual BaseClient._request() code path and could miss regressions. Prefer calling BaseClient._request() (e.g., via super()) while stubbing the network layer, or invoking a public method (get/post) with a patched PoolManager request.

Copilot uses AI. Check for mistakes.
Comment thread gnat/clients/base.py
cost : int
Cost units the connector attempted to consume.
remaining : int
Budget units remaining when the check failed (always 0 or negative).
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BudgetExceeded docstring says remaining is always 0 or negative, but QueryBudget.consume() raises before consuming and passes the current remaining (which may be >0). Update the docstring (or change semantics) so it matches actual behavior.

Suggested change
Budget units remaining when the check failed (always 0 or negative).
Budget units remaining when the check failed. This is the current
remaining budget before consumption, so it may be positive if
``cost`` exceeds the available budget.

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +175
Queries the search index with the hypothesis statement. Matching
objects are counted and weighted by their source connector's trust
level (from ``source_platform`` metadata). The confidence score is
updated in-place.

Parameters
----------
hypothesis_id : str
STIX ID of the hypothesis to evaluate.

Returns
-------
STIXHypothesis
The updated hypothesis.

Raises
------
KeyError
If no hypothesis with *hypothesis_id* is found.
"""
h = self._load(hypothesis_id)

# Query search index for corroborating evidence
statement = h._properties.get("statement", "")
corroborating_ids: list[str] = []
try:
corroborating_ids = self._search_index.search(statement, limit=20)
except Exception as exc: # noqa: BLE001
logger.debug("HypothesisEngine: search index unavailable — %s", exc)

# Compute weighted confidence from evidence counts
support_count = len(h._properties.get("supporting_evidence", []))
refute_count = len(h._properties.get("refuting_evidence", []))
corroboration_boost = min(len(corroborating_ids) * 0.05, 0.3)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HypothesisEngine.evaluate() docstring says corroborating objects are "counted and weighted" by source connector trust level, and the module defines _TRUST_WEIGHTS, but the implementation never uses trust weighting (it only uses supporting/refuting counts plus a flat corroboration boost). Either implement the described trust-weighted scoring or adjust the docstring/constants to avoid misleading behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +248
def _persist(self, h: STIXHypothesis) -> None:
ws = self._manager.open(self._workspace_name)
ws._add_object(h.to_dict(), mark_dirty=True)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_persist() uses mark_dirty=True when storing hypotheses. In Workspace, mark_dirty adds the object to the dirty set, so a later workspace.commit() will attempt to write these internal x-gnat-hypothesis objects back to the default global context/connector (likely failing or causing unintended side effects). If hypotheses are meant to be internal-only, persist them without marking dirty (or introduce a separate persistence path that doesn't enqueue for commit).

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +169
gov.rate_limit_check("agent-1")
gov.rate_limit_check("agent-1")
# Wait for window to expire
time.sleep(1.1)
# Should be allowed again
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses time.sleep() to wait for the sliding window to expire, which slows the unit test suite and can be flaky under load. Prefer patching time.monotonic() (or injecting a clock into AgentGovernor) so the window expiry can be tested deterministically without real waiting.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants