diff --git a/docs/explanation/architecture/adrs/0039-ADR-execution-context.md b/docs/explanation/architecture/adrs/0039-ADR-execution-context.md new file mode 100644 index 00000000..3ab76d6d --- /dev/null +++ b/docs/explanation/architecture/adrs/0039-ADR-execution-context.md @@ -0,0 +1,255 @@ +# ADR-0039 — Unified Execution Context + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT orchestrates a heterogeneous set of operations: ingestion pipeline runs, +connector enrichment calls, AI agent actions, export jobs, and report +publishing. Each of these operations executes independently and, prior to +this ADR, had no mechanism to: + +1. Establish **who** initiated the operation (a named connector, an agent + identifier, or a human operator via the CLI). +2. Declare **which domain** the operation belongs to (`ingestion`, `analysis`, + `investigation`, `reporting`, `execution`). +3. Carry a **trust level** that flows from the originating data source into + downstream scoring and policy decisions. +4. Enforce **workspace isolation** — preventing an ingestion job from one + tenant from accidentally writing objects into another tenant's workspace. +5. Record a **replay flag** so that a re-run of a crashed pipeline can suppress + side effects (SOAR triggers, webhook emissions, duplicate enrichment calls). +6. Impose a **query budget** to prevent runaway agent loops from exhausting + API quota or compute time. + +Without a unifying carrier object, each component invented its own partial +solution: pipeline runners passed `workspace_id` as a bare string; the +enrichment dispatcher read `TRUST_LEVEL` from the connector class but did not +propagate it; agents tracked their own call counters in local state; replay +detection was entirely absent. + +The result was a system that was difficult to trace, impossible to replay +safely, and unable to enforce trust-aware prioritisation consistently. + +--- + +## Decision + +Introduce `ExecutionContext` — a lightweight, immutable dataclass that every +pipeline entry point creates at startup and passes through the call chain. + +### Location + +`gnat/core/context.py` + +### Fields + +| Field | Type | Description | +|-------|------|-------------| +| `context_id` | `UUID` | Unique identifier for this execution; used as correlation ID in logs and the `execution_log` table | +| `initiated_by` | `str` | Connector name, agent ID, or `"manual"` (CLI/TUI) | +| `domain` | `str` | One of `ingestion`, `analysis`, `investigation`, `reporting`, `execution` | +| `trust_level` | `str` | `trusted_internal`, `semi_trusted`, or `untrusted_external` | +| `policy_set` | `str \| None` | Named policy set applied to this context; `None` uses the default | +| `workspace_id` | `str` | Workspace isolation boundary; all writes are scoped to this ID | +| `created_at` | `datetime` | UTC timestamp at construction time | +| `parent_context_id` | `UUID \| None` | ID of the parent context when this is a child span | +| `is_replay` | `bool` | `True` suppresses SOAR triggers and idempotent write skip logging | +| `budget` | `QueryBudget \| None` | Optional call budget; `None` means unlimited | + +`QueryBudget` is a small companion dataclass: + +```python +@dataclass +class QueryBudget: + max_connector_calls: int = 50 + max_agent_tokens: int = 100_000 + _connector_calls: int = field(default=0, repr=False) + _agent_tokens: int = field(default=0, repr=False) + + def charge_connector(self, n: int = 1) -> None: + self._connector_calls += n + if self._connector_calls > self.max_connector_calls: + raise BudgetExceededError("connector call budget exhausted") + + def charge_tokens(self, n: int) -> None: + self._agent_tokens += n + if self._agent_tokens > self.max_agent_tokens: + raise BudgetExceededError("agent token budget exhausted") +``` + +### Factory Methods + +**`ExecutionContext.create()`** — default factory for manual / CLI invocations: + +```python +ctx = ExecutionContext.create( + initiated_by="manual", + domain="ingestion", + workspace_id="default", +) +``` + +**`ExecutionContext.from_connector(connector)`** — reads `TRUST_LEVEL` from +the connector class variable and sets `initiated_by` to the connector's module +name: + +```python +ctx = ExecutionContext.from_connector( + connector=crowdstrike_client, + domain="ingestion", + workspace_id=workspace_id, +) +# ctx.trust_level == "semi_trusted" +# ctx.initiated_by == "crowdstrike" +``` + +**`ExecutionContext.child()`** — derives a child context that inherits +`workspace_id`, `trust_level`, and `budget` from the parent but receives a +new `context_id` and `parent_context_id`: + +```python +child_ctx = ctx.child(domain="analysis", initiated_by="reasoning_engine") +assert child_ctx.workspace_id == ctx.workspace_id +assert child_ctx.parent_context_id == ctx.context_id +assert child_ctx.context_id != ctx.context_id +``` + +### Persistence + +Every context is persisted to the `execution_log` table (introduced in Alembic +migration `0004_add_execution_log.py`): + +| Column | Type | Notes | +|--------|------|-------| +| `id` | `UUID` | Primary key; maps to `context_id` | +| `initiated_by` | `VARCHAR(255)` | | +| `domain` | `VARCHAR(64)` | | +| `trust_level` | `VARCHAR(64)` | | +| `workspace_id` | `VARCHAR(255)` | Indexed | +| `parent_context_id` | `UUID` | Nullable; foreign key to same table | +| `is_replay` | `BOOLEAN` | | +| `created_at` | `TIMESTAMP` | UTC | +| `event_type` | `VARCHAR(64)` | `context_start`, `context_end`, `security_event` | +| `metadata` | `TEXT` | JSON-encoded supplementary data | + +Trust escalation attempts (a caller supplying a higher trust level than its +connector class declares) are detected in `from_connector()` and written as +`security_event` rows in `execution_log`. + +### Integration Points + +All pipeline entry points create a context at startup: + +```python +# gnat/ingest/pipeline.py +class IngestPipeline: + def run(self, workspace_id: str, connector) -> IngestResult: + ctx = ExecutionContext.from_connector(connector, domain="ingestion", + workspace_id=workspace_id) + self._ctx_store.persist(ctx) + # ... pipeline body passes ctx through ... +``` + +```python +# gnat/export/pipeline.py +class ExportPipeline: + def run(self, workspace_id: str) -> ExportResult: + ctx = ExecutionContext.create(initiated_by="manual", + domain="reporting", + workspace_id=workspace_id) + self._ctx_store.persist(ctx) +``` + +Agent actions use `child()` to preserve the parent trace: + +```python +# gnat/agents/research.py +class ResearchAgent: + def run(self, parent_ctx: ExecutionContext, query: str): + ctx = parent_ctx.child(domain="analysis", initiated_by=self.agent_id) + self._ctx_store.persist(ctx) +``` + +--- + +## Consequences + +### Positive + +- **Full traceability:** every operation, regardless of component, carries a + correlation ID linkable back to a parent chain in `execution_log`. +- **Replay safety:** `is_replay=True` allows pipeline runners to re-run a + crashed job without firing SOAR triggers or creating duplicate enrichment + side effects. +- **Trust propagation:** `trust_level` flows from connector declaration through + the pipeline to `ReasoningEngine` scoring without any caller needing to + re-derive it. +- **Parent-child trace trees:** nested operations (agent spawning a connector + call) produce traceable parent-child trees queryable from `execution_log`. +- **Budget enforcement:** `QueryBudget` prevents agent runaway without + requiring each connector to implement its own call counter. +- **Zero new runtime dependencies:** `ExecutionContext` is a plain Python + dataclass; persistence uses the existing SQLAlchemy `[persist]` extra. + +### Negative / Trade-offs + +- **Caller discipline required:** every pipeline entry point must remember to + create and thread through the context; there is no automatic injection. + Connectors called directly (outside a pipeline) will not have a context + unless they construct one manually. +- **Database write on every operation:** persisting context to `execution_log` + adds one `INSERT` per pipeline run. High-frequency enrichment loops may + produce large log volumes; a retention policy is needed. +- **Replay flag is advisory:** `is_replay=True` suppresses SOAR triggers only + in GNAT-internal components. External webhooks reached before the context + was consulted are not automatically suppressed. + +### Deferred + +- Automatic context injection via a Python contextvars carrier (removes caller + discipline requirement for async code paths). +- Streaming context events to an external observability backend (OpenTelemetry + trace export). +- `execution_log` retention and archival policies. +- Budget accounting UI in the TUI dashboard. + +--- + +## Alternatives Considered + +### Thread-local context + +Storing the current `ExecutionContext` in a `threading.local()` variable would +remove the need to pass it through every call site. Rejected because GNAT +supports both sync (`urllib3`) and async (`httpx`) code paths. +`threading.local()` is invisible to `asyncio` tasks, so async connectors +launched in the same event loop but different coroutines would silently inherit +the wrong context or lose it entirely. + +### Decorator injection (`@with_context`) + +A class decorator that automatically wraps `authenticate()`, `get_object()`, +etc. with context creation was prototyped. Rejected because: +1. It couples the decorator to the connector lifecycle, making it hard to use + `ExecutionContext` in non-connector code (agents, pipelines). +2. It hides context creation from the caller, making replay control (setting + `is_replay=True`) harder to express. +3. It does not support `child()` semantics where a parent context already + exists. + +### OpenTelemetry `Span` as the carrier + +Using `opentelemetry.trace.Span` directly as the execution carrier was +considered. Rejected because it would add a mandatory dependency on the +`opentelemetry-api` package for every GNAT installation, even those that do +not export traces. `ExecutionContext` is a thin, dependency-free dataclass; +OTel integration can be layered on top as a future extra. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md b/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md new file mode 100644 index 00000000..b7e69b74 --- /dev/null +++ b/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md @@ -0,0 +1,293 @@ +# ADR-0040 — Connector Trust Level Classification + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT integrates with 99 distinct security and threat intelligence platforms. +These connectors span a wide spectrum of data reliability and authority: + +- An internal SIEM (Splunk, Microsoft Sentinel, IBM QRadar) is operated by the + organisation itself; its indicators are authoritative by definition. +- Commercial threat intelligence feeds (ThreatQ, Recorded Future, CrowdStrike) + are curated by professional analysts and carry strong but not absolute + reliability. +- Community or public feeds (AlienVault OTX, Shadowserver, CISA KEV) are + maintained by volunteers or government bodies; data quality varies widely and + indicators may be stale or incorrectly attributed. + +Prior to this ADR every connector carried **equal implicit trust**. The +enrichment dispatcher treated a hit from AlienVault OTX identically to a hit +from the organisation's own Splunk deployment. The `ReasoningEngine` +introduced in Phase 4C (see ADR-0044) needed a **stable, declarative source of +trust authority** to compute trust-weighted scores without requiring each call +site to re-derive trust from the connector's identity. + +Three requirements drove the design: + +1. **Declarative, not runtime-computed:** trust level must be a class-level + constant that static analysis tools and policy agents can inspect without + instantiating a connector. +2. **Propagatable:** trust must flow automatically from connector declaration + into `ExecutionContext` (ADR-0039) and from there into `ReasoningEngine` + scoring (ADR-0044). +3. **Auditable:** attempts to escalate trust above what a connector class + declares must be detected and logged. + +--- + +## Decision + +### Class Variable on `BaseClient` + +Add a single class variable to `gnat/clients/base.py`: + +```python +class BaseClient: + """Base HTTP client for all GNAT connectors.""" + + # Trust level for data produced by this connector. + # Subclasses MUST override this if they are not semi-trusted. + TRUST_LEVEL: str = "semi_trusted" +``` + +Every concrete connector subclass overrides `TRUST_LEVEL` to one of three +enumerated string constants defined in `gnat/core/trust.py`: + +```python +TRUSTED_INTERNAL = "trusted_internal" +SEMI_TRUSTED = "semi_trusted" +UNTRUSTED_EXTERNAL = "untrusted_external" +``` + +### Classification Assignments + +The following table shows the trust assignment for all 99 connectors. +Connectors not listed below carry the default `semi_trusted` level. + +#### `trusted_internal` + +These connectors represent data that is operated, controlled, and +authoritative within the customer's own environment. + +| Connector | Module | Rationale | +|-----------|--------|-----------| +| Splunk | `gnat/connectors/splunk/` | Internal SIEM; customer-operated | +| Microsoft Sentinel | `gnat/connectors/sentinel/` | Internal cloud SIEM | +| IBM QRadar | `gnat/connectors/qradar/` | Internal SIEM | +| Elastic SIEM | `gnat/connectors/elastic/` | Internal SIEM/XDR | +| Graylog | `gnat/connectors/graylog/` | Internal log aggregation | +| Security Onion | `gnat/connectors/security_onion/` | Internal NSM/SIEM | +| Wazuh | `gnat/connectors/wazuh/` | Internal SIEM/XDR | +| Palo Alto XSOAR | `gnat/connectors/xsoar/` | Internal SOAR orchestrator | + +#### `semi_trusted` + +Professional, commercially-operated or well-established open-source platforms +whose data quality is high but not self-certified. + +| Connector | Module | Rationale | +|-----------|--------|-----------| +| ThreatQ | `gnat/connectors/threatq/` | Commercial TIP with curation | +| CrowdStrike Falcon | `gnat/connectors/crowdstrike/` | Commercial EDR/TI | +| Recorded Future | `gnat/connectors/recordedfuture/` | Commercial TI | +| Feedly | `gnat/connectors/feedly/` | Curated commercial feed | +| VirusTotal | `gnat/connectors/virustotal/` | Commercial multi-scanner | +| MISP | `gnat/connectors/misp/` | Open-source TIP, community-vetted | +| Mandiant Advantage | `gnat/connectors/mandiant/` | Commercial TI | +| Flashpoint | `gnat/connectors/flashpoint/` | Commercial dark-web TI | +| Intel 471 | `gnat/connectors/intel471/` | Commercial cybercrime TI | +| Group-IB | `gnat/connectors/group_ib/` | Commercial TI | +| Anomali ThreatStream | `gnat/connectors/threatstream/` | Commercial TIP | +| ThreatConnect | `gnat/connectors/threatconnect/` | Commercial TIP | + +All remaining connectors not listed in the trusted_internal or +untrusted_external sections default to `semi_trusted` at the `BaseClient` +level. + +#### `untrusted_external` + +Community-contributed, public, or government feeds where quality control is +limited or the submission model is open. + +| Connector | Module | Rationale | +|-----------|--------|-----------| +| AlienVault OTX | `gnat/connectors/alienvault/` | Open community submissions | +| Shadowserver Foundation | `gnat/connectors/shadowserver/` | Public; quality varies by dataset | +| CISA KEV | `gnat/connectors/cisa/` | Government advisory; no auth; coverage gaps | +| PulseDive | `gnat/connectors/pulsedive/` | Community-aggregated | +| GreyNoise | `gnat/connectors/greynoise/` | Mass-scanner data; noisy by design | +| Have I Been Pwned | `gnat/connectors/hibp/` | Breach aggregate; no attribution | +| Hudson Rock | `gnat/connectors/hudsonrock/` | Breach intelligence; community-sourced | + +### Example Overrides + +```python +# gnat/connectors/splunk/client.py +class SplunkClient(BaseClient): + TRUST_LEVEL = "trusted_internal" + +# gnat/connectors/alienvault/client.py +class AlienVaultClient(BaseClient): + TRUST_LEVEL = "untrusted_external" + +# gnat/connectors/threatq/client.py +class ThreatQClient(BaseClient): + TRUST_LEVEL = "semi_trusted" # explicit; same as default but self-documenting +``` + +### Integration with `ExecutionContext` + +`ExecutionContext.from_connector()` (ADR-0039) reads `TRUST_LEVEL` via the +class, not the instance, so it is available before authentication: + +```python +@classmethod +def from_connector( + cls, + connector: BaseClient, + domain: str, + workspace_id: str, + policy_set: str | None = None, + budget: QueryBudget | None = None, +) -> "ExecutionContext": + declared_trust = type(connector).TRUST_LEVEL + return cls( + context_id=uuid4(), + initiated_by=type(connector).__module__.split(".")[-2], + domain=domain, + trust_level=declared_trust, + policy_set=policy_set, + workspace_id=workspace_id, + created_at=datetime.utcnow(), + parent_context_id=None, + is_replay=False, + budget=budget, + ) +``` + +### Trust Escalation Detection + +If a caller constructs an `ExecutionContext` manually and supplies a +`trust_level` higher than the connector class declares, the mismatch is +detected in `ExecutionContext.from_connector()` and written as a +`security_event` row to `execution_log`: + +```python +if requested_trust != declared_trust: + _log_security_event( + event="trust_escalation_attempt", + connector=type(connector).__name__, + declared=declared_trust, + requested=requested_trust, + workspace_id=workspace_id, + ) + # requested_trust is ignored; declared_trust is used +``` + +### Trust Weight Mapping + +The trust level string maps to a numeric weight used by `ReasoningEngine` +(ADR-0044): + +| Trust Level | Weight | +|-------------|--------| +| `trusted_internal` | 0.9 | +| `semi_trusted` | 0.6 | +| `untrusted_external` | 0.3 | + +The mapping is defined in `gnat/core/trust.py` as `TRUST_WEIGHTS: dict[str, float]` +and shared between `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine` +to ensure a single source of truth. + +--- + +## Consequences + +### Positive + +- **Declarative and inspectable:** `TRUST_LEVEL` is a class constant that can + be read by policy agents, linters, and documentation generators without + instantiating a connector or making any network call. +- **Zero runtime cost:** reading a class variable adds no overhead compared to + the HTTP call that follows. +- **Automatic propagation:** once set on the class, trust flows into + `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine` without any + additional caller configuration. +- **Auditable escalation:** any attempt to override the declared trust level is + logged before being silently rejected; the declared level always wins. +- **No breaking changes:** the default (`semi_trusted`) means existing + connectors that have not yet been classified behave identically to the + pre-ADR behaviour. + +### Negative / Trade-offs + +- **Static classification:** trust level is a class constant, not a + runtime-configurable value. An operator who has additional context (e.g. + "our OTX subscription is curated by an analyst") cannot elevate a connector's + trust without modifying source code or subclassing. +- **Binary per connector:** trust is assigned at the connector level, not at + the dataset or indicator level. A connector that mixes high- and low-quality + data (e.g. VirusTotal community vs. premium API hits) cannot express that + distinction through `TRUST_LEVEL` alone; per-object tagging (deferred) is + needed for that. +- **Classification maintenance:** as new connectors are added, the platform + team must consciously assign a trust level; the default `semi_trusted` acts + as a safe backstop but may be too conservative or too permissive depending on + context. + +### Deferred + +- **Operator-configurable trust override:** allow operators to raise or lower a + connector's effective trust via the INI config file (e.g. + `[alienvault] trust_override = semi_trusted`) without modifying source code. +- **Per-object trust tags:** complement connector-level trust with + indicator-level confidence tags derived from raw connector metadata (e.g. + VirusTotal detection ratio, MISP event distribution level). +- **Dynamic trust scoring:** a future `TrustCalibrationAgent` could observe + long-term accuracy of indicators per connector and automatically adjust trust + weights; this is deferred pending training data collection. + +--- + +## Alternatives Considered + +### Per-object trust tags at ingest time + +Rather than a connector-level class constant, each mapper could attach a trust +tag to every `STIXBase` object it produces. Rejected because: + +1. Every mapper author would need to decide on trust independently, leading to + inconsistency. +2. Mappers do not always have access to the connector identity at call time. +3. The per-object approach does not express *source authority* — the question of + "how much do I trust this platform in general?" is separate from "how + confident is this individual indicator?" and both are needed. + +### Dynamic trust scoring based on historical accuracy + +A scoring model that adjusts trust weights based on observed true-positive rates +per connector was considered. Deferred (not rejected) because it requires +several months of labelled ground-truth data that does not yet exist. The +static classification in this ADR will serve as the training baseline once +collection begins. + +### INI-file trust assignment + +Defining trust levels in `config.ini` rather than as class constants was +considered. Rejected for the initial implementation because: + +1. It would require a running config loader before any connector can be + classified, making static analysis and documentation generation more complex. +2. Class constants are self-documenting in the source tree and version-controlled + alongside the connector code. +3. Operator overrides via INI are deferred work and can be layered on top of the + class-constant baseline without replacing it. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md b/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md new file mode 100644 index 00000000..7ed91492 --- /dev/null +++ b/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md @@ -0,0 +1,325 @@ +# ADR-0041 — Idempotency and ORM Schema Versioning + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +### The Replay Problem + +GNAT pipelines are long-running processes that may be interrupted mid-flight: +network partitions, database deadlocks, container restarts, and operator +`SIGINT` are all common causes. When a pipeline is restarted, it must be safe +to replay from the beginning without producing: + +- **Duplicate STIX objects** in the workspace store (violating the uniqueness + contract of STIX IDs per platform). +- **Double SOAR triggers** (sending the same alert to a SOAR platform twice). +- **Duplicate enrichment calls** (wasting API quota on already-processed + indicators). + +Prior to this ADR, GNAT had no pipeline-level idempotency mechanism. Connector +code performed ad-hoc checks ("does this STIX ID already exist?") but these +were inconsistent and did not cover all write paths. A crashed ingest run that +had completed 800 of 1,000 records before failing would, on restart, attempt to +re-process all 1,000 records and fail on uniqueness constraints in the ORM +layer. + +### The Schema Evolution Problem + +The STIX 2.1 ORM (ADR-0031, ADR-0032) uses a property-bag pattern: +`STIXBase._properties` stores all non-core fields as an untyped dict. When a +breaking change is made to a field (e.g. `threat_score: float` renamed to +`confidence: float`, or a field's semantics change such that old serialised +values are incorrect), there is no mechanism to detect that persisted objects +were produced by an older version of the ORM and need migration. + +Two independent deployment scenarios require schema versioning: + +1. **Rolling upgrades:** a GNAT worker is upgraded to a new version while the + workspace database still contains objects serialised by the previous version. +2. **Test isolation:** fixture factories in `tests/` need to produce objects that + match the current schema without coupling to specific field values. + +--- + +## Decision + +### Part 1: Idempotency Keys + +#### Key Format + +Every write to the workspace store is gated by an idempotency key computed +by `WorkspaceStore.make_idempotency_key()`: + +``` +{connector_id}:{stix_type}:{external_id}:{sha1_content_hash[:12]} +``` + +- **`connector_id`** — the connector's module name (e.g. `crowdstrike`, + `alienvault`). Scopes the key to a source; the same external ID from two + different connectors does not collide. +- **`stix_type`** — the STIX object type string (e.g. `indicator`, + `threat-actor`). +- **`external_id`** — the platform-native identifier for the object (e.g. a + ThreatQ indicator ID, a CrowdStrike IOC value). If unavailable, the STIX + `id` field is used. +- **`sha1_content_hash[:12]`** — first 12 hex characters of the SHA-1 digest of + the object's canonical JSON representation (keys sorted, no whitespace). + Detects content changes even when the external ID is stable. + +```python +import hashlib, json + +def make_idempotency_key( + connector_id: str, + stix_obj: STIXBase, + external_id: str | None = None, +) -> str: + ext = external_id or stix_obj.id + payload = json.dumps(stix_obj.to_dict(), sort_keys=True, separators=(",", ":")) + content_hash = hashlib.sha1(payload.encode()).hexdigest()[:12] + return f"{connector_id}:{stix_obj.type}:{ext}:{content_hash}" +``` + +#### Database Storage + +The idempotency key is stored as a `VARCHAR(255)` column on the +`workspace_objects` table, introduced via Alembic migration +`0005_add_idempotency_key.py`: + +```sql +ALTER TABLE workspace_objects + ADD COLUMN idempotency_key VARCHAR(255); + +CREATE UNIQUE INDEX uix_workspace_objects_idempotency + ON workspace_objects (workspace_id, idempotency_key) + WHERE idempotency_key IS NOT NULL; +``` + +The partial unique index (`WHERE idempotency_key IS NOT NULL`) ensures that +objects written by code paths that pre-date this ADR (which will have +`NULL` keys) are not incorrectly flagged as duplicates. + +#### Write Path + +`WorkspaceStore.upsert()` now follows this sequence: + +```python +def upsert( + self, + stix_obj: STIXBase, + ctx: ExecutionContext, + external_id: str | None = None, +) -> UpsertResult: + key = make_idempotency_key(ctx.initiated_by, stix_obj, external_id) + existing = self._session.query(WorkspaceObjectModel)\ + .filter_by(workspace_id=ctx.workspace_id, idempotency_key=key)\ + .first() + + if existing: + _log_to_execution_log(ctx, event_type="idempotent_skip", key=key) + return UpsertResult(skipped=True, object_id=existing.stix_id) + + # ... proceed with INSERT ... + return UpsertResult(skipped=False, object_id=stix_obj.id) +``` + +`UpsertResult` is a small dataclass with `skipped: bool` and `object_id: str`. +Callers that need to distinguish new writes from idempotent skips (e.g. pipeline +progress reporters) can inspect `result.skipped`. + +#### Replay Integration + +When `ExecutionContext.is_replay` is `True`, idempotent skips are still +performed (preventing duplicate writes) but the skip is recorded with +`event_type="replay_skip"` in `execution_log` rather than +`"idempotent_skip"`. This allows operators to distinguish between "normal +deduplication" and "replay recovery" in audit queries: + +```sql +-- Count objects successfully replayed vs. newly written +SELECT event_type, COUNT(*) FROM execution_log +WHERE context_id = :replay_context_id +GROUP BY event_type; +``` + +SOAR triggers and external webhook emissions are suppressed when +`ctx.is_replay` is `True`, regardless of whether the write was skipped. + +### Part 2: ORM Schema Versioning + +#### `schema_version` Class Variable + +`STIXBase` gains a class variable: + +```python +class STIXBase: + """Base class for all GNAT STIX ORM objects.""" + + schema_version: int = 1 + """ + Monotonically increasing integer. Increment only on breaking field changes. + Additive changes (new optional fields) do not require a bump. + """ +``` + +Subclasses override `schema_version` when they introduce a breaking change: + +```python +class STIXIndicator(STIXBase): + schema_version: int = 2 # bumped when 'threat_score' was renamed 'confidence' +``` + +#### Serialisation + +`STIXBase.to_dict()` includes `schema_version` in its output: + +```python +def to_dict(self) -> dict: + return { + "type": self.type, + "id": self.id, + "schema_version": self.schema_version, + **self._properties, + } +``` + +#### Deserialisation and Migration + +`STIXBase.from_dict()` reads the `schema_version` from the serialised payload +and, if it differs from the current class's `schema_version`, invokes the +registered migration chain: + +```python +@classmethod +def from_dict(cls, data: dict) -> "STIXBase": + stored_version = data.get("schema_version", 1) + current_version = cls.schema_version + if stored_version < current_version: + data = _apply_migrations(cls, data, stored_version, current_version) + obj = cls.__new__(cls) + # ... populate fields from data ... + return obj +``` + +Migration functions are registered per class in +`gnat/orm/migrations.py` using a simple decorator: + +```python +@schema_migration(STIXIndicator, from_version=1, to_version=2) +def _migrate_indicator_v1_to_v2(data: dict) -> dict: + # Rename 'threat_score' to 'confidence' + if "threat_score" in data: + data["confidence"] = data.pop("threat_score") + return data +``` + +#### Version Bump Policy + +| Change type | Version bump? | +|-------------|---------------| +| Add a new optional field | No | +| Add a new required field with a default value | No | +| Remove a field | Yes | +| Rename a field | Yes | +| Change a field's type or semantics | Yes | +| Add a new method (no field impact) | No | + +This policy keeps the version number low and stable for the common additive +case while ensuring that breaking changes are detectable. + +--- + +## Consequences + +### Positive + +- **Pipelines are fully idempotent:** restarting a crashed ingest job from the + beginning is safe; already-written objects are skipped cleanly without + database constraint violations or duplicate STIX IDs. +- **Replay is auditable:** `execution_log` records `replay_skip` events + separately from normal `idempotent_skip` events, enabling operators to measure + recovery completeness. +- **SOAR trigger safety:** `is_replay` suppression prevents double-alerting + even when a replay re-processes objects that were already written in a prior + partial run. +- **Schema evolution is controlled:** `schema_version` makes breaking field + changes detectable and migrateable; additive changes do not require a bump, + keeping the version number stable for routine development. +- **No new storage tables:** idempotency keys are a column on the existing + `workspace_objects` table; schema versions are serialised into the existing + JSON payload. No additional infrastructure is required. + +### Negative / Trade-offs + +- **Key computation cost:** SHA-1 of the canonical JSON is computed on every + write, adding ~0.1 ms per object on a typical developer machine. At 10,000 + objects per ingest run this is ~1 second, acceptable for the safety guarantee. +- **Partial index coverage:** objects written before migration `0005` have + `NULL` idempotency keys and are not protected by idempotency. A backfill job + can populate keys for existing objects but is not automated. +- **Migration chain maintenance:** as `schema_version` grows, the migration + chain from version 1 to the current version must be maintained. A test in + `tests/unit/orm/test_schema_migrations.py` validates every registered + migration in sequence. +- **Content-hash sensitivity:** if two connectors produce the same indicator + with different metadata (e.g. different `labels` lists), the content hash + differs and both are stored as distinct objects. This is correct behaviour + but may surprise operators who expect connector-level deduplication. + +### Deferred + +- **Backfill job** for populating idempotency keys on pre-migration objects. +- **Key expiry policy:** idempotency keys for objects deleted from the workspace + should be cleaned up to prevent key exhaustion in very long-running + deployments. +- **Cross-workspace deduplication:** the current scheme deduplicates within a + single `workspace_id`; cross-workspace deduplication (e.g. between a staging + and production workspace) is out of scope. +- **ORM migration CLI command:** `gnat orm migrate --dry-run` to preview + pending migrations before a deployment. + +--- + +## Alternatives Considered + +### Content-addressed storage (STIX ID as primary key) + +STIX IDs are already unique per platform: a STIX indicator with a given ID from +CrowdStrike is always the same logical object. Using the STIX ID as the sole +uniqueness key was considered as an alternative to a separate idempotency key. + +Rejected because: + +1. The same logical indicator can arrive from multiple connectors with different + STIX IDs (each connector may assign its own UUID-based ID) but the same + content. STIX ID uniqueness does not prevent cross-connector duplicates. +2. STIX IDs do not capture content changes: a connector may reassign the same + ID to an updated indicator. The content hash component of the idempotency + key detects this case and allows the update through. + +### Alembic-only schema versioning + +Using Alembic migrations exclusively to manage ORM field changes was considered. +Alembic tracks database schema changes (table columns, indexes) but does not +address ORM-level field renames or semantic changes that are expressed in the +JSON property bag. Alembic is still used for database schema changes +(migration `0005`); `schema_version` complements it by covering the ORM object +layer that Alembic cannot reach. + +### Event sourcing for idempotency + +An event-sourced store where every write is an event and idempotency is +guaranteed by event log position was considered. Rejected because it would +require a fundamental redesign of the workspace store and all connectors, +displacing the existing `workspace_objects` table and the established connector +contract (ADR-0031). Event sourcing remains a long-term architectural option +if GNAT grows to require it. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md b/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md new file mode 100644 index 00000000..212e5837 --- /dev/null +++ b/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md @@ -0,0 +1,378 @@ +# ADR-0042 — Hypothesis Testing Engine (Phase 4C) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +Threat intelligence analysis is fundamentally a hypothesis-driven activity. +An analyst observing a cluster of suspicious indicators might form the hypothesis +"192.0.2.1 is a Lazarus Group command-and-control server" and then accumulate +evidence for or against that claim over days or weeks. + +Prior to this ADR, GNAT had no structured mechanism for tracking hypotheses. +Analysts recorded their assessments as free-text investigation notes, which +meant: + +- **No machine-readable lifecycle:** hypotheses could not transition through + `pending → confirmed → refuted` states in a way that downstream systems + (SOAR, reporting) could act on. +- **No evidence linkage:** supporting or refuting observations were stored as + narrative text rather than as typed STIX relationship references, making it + impossible to audit the evidence chain. +- **No automated corroboration:** the Solr search index (ADR-0028 derivative) + accumulated relevant hits but nothing queried it on a hypothesis's behalf. +- **No confidence tracking:** a hypothesis's confidence was not updated as new + evidence arrived; analysts had to manually re-read all notes to reassess. + +The `ReasoningEngine` (ADR-0044) needed a structured hypothesis type to feed +into its scoring pipeline, and the `HypothesisEngine` itself needed a home in +the GNAT architecture that was consistent with the existing STIX custom object +pattern (ADR-0032). + +--- + +## Decision + +### `STIXHypothesis` Custom SDO + +A new custom STIX Domain Object is defined in +`gnat/stix/sdos/hypothesis.py`: + +```python +@dataclass +class STIXHypothesis(STIXBase): + """ + x-gnat-hypothesis — STIX custom SDO for analyst hypotheses. + + Represents a structured claim about a threat actor, campaign, or observable + that can be confirmed, refuted, or left inconclusive by accumulated evidence. + """ + + type: str = "x-gnat-hypothesis" + schema_version: int = 1 + + # Core fields + statement: str = "" # Natural-language hypothesis text + confidence: float = 0.2 # [0.0, 1.0]; updated by evaluate() + status: str = "pending" # pending | confirmed | refuted | inconclusive + + # Evidence arrays — store STIX relationship IDs + supporting_evidence: list[str] = field(default_factory=list) + refuting_evidence: list[str] = field(default_factory=list) + + # Provenance + created_by: str = "" # initiated_by from the creating ExecutionContext + workspace_id: str = "" + created_at: datetime | None = None + last_evaluated_at: datetime | None = None +``` + +`STIXHypothesis` is registered in `gnat/stix/sdos/__init__.py` alongside +other custom SDOs (`x-gnat-report-summary`, `x-gnat-enrichment-log`). + +Evidence is stored as STIX relationship IDs (strings matching the STIX +`relationship--` pattern) rather than direct STIX IDs so that the +evidence relationship itself carries the semantic link (e.g. +`relationship_type: "supports"` or `relationship_type: "refutes"`). + +#### Status State Machine + +``` + propose() + ───────► pending + │ + evaluate() │ + ┌────────────┤ + │ │ + confidence ≥ 0.75 │ 0.15 < confidence < 0.75 + │ │ + ▼ ▼ confidence ≤ 0.15 + confirmed (unchanged) AND refuting_evidence + ───────────────► refuted + │ + close(verdict) │ + ───────► inconclusive (when verdict == "inconclusive") +``` + +### `HypothesisEngine` + +`gnat/reasoning/hypothesis.py` provides the lifecycle manager: + +```python +class HypothesisEngine: + """ + Manages the full lifecycle of STIXHypothesis objects: + propose → evaluate → close. + """ + + def __init__( + self, + store: WorkspaceStore, + search_index: SearchIndex, # SolrSearchIndex or NullSearchIndex + trust_weights: dict[str, float] | None = None, + ) -> None: + self._store = store + self._search = search_index + self._weights = trust_weights or TRUST_WEIGHTS # from gnat.core.trust +``` + +#### `propose()` + +Creates and persists a new `STIXHypothesis` in the workspace: + +```python +def propose( + self, + statement: str, + initial_evidence: list[str], + ctx: ExecutionContext, + confidence: float = 0.2, +) -> STIXHypothesis: + """ + Parameters + ---------- + statement : str + Natural-language hypothesis text (e.g. "192.0.2.1 is Lazarus C2"). + initial_evidence : list[str] + STIX relationship IDs linking the hypothesis to supporting objects. + ctx : ExecutionContext + Execution context; workspace_id and initiated_by are taken from here. + confidence : float + Initial confidence score in [0.0, 1.0]. Defaults to 0.2 (weak prior). + + Returns + ------- + STIXHypothesis + The persisted hypothesis object. + """ + hyp = STIXHypothesis( + id=f"x-gnat-hypothesis--{uuid4()}", + statement=statement, + confidence=confidence, + status="pending", + supporting_evidence=list(initial_evidence), + refuting_evidence=[], + created_by=ctx.initiated_by, + workspace_id=ctx.workspace_id, + created_at=datetime.utcnow(), + ) + self._store.upsert(hyp, ctx) + return hyp +``` + +#### `evaluate()` + +Queries Solr for corroborating or refuting evidence and updates confidence: + +```python +def evaluate( + self, + hypothesis_id: str, + ctx: ExecutionContext, +) -> STIXHypothesis: + """ + Re-scores a hypothesis by querying the Solr search index for evidence + corroborating or refuting its statement, then updates its confidence + and (if thresholds are crossed) its status. + """ + hyp = self._store.get(hypothesis_id, STIXHypothesis) + + # 1. Solr full-text query on the hypothesis statement + hits = self._search.query(hyp.statement, fields=["name", "pattern", "description"]) + + # 2. Weight each hit by the trust level of its source connector + weighted_sum = 0.0 + for hit in hits: + trust = hit.get("source_trust_level", "semi_trusted") + weighted_sum += self._weights.get(trust, 0.6) + + # 3. Normalise to [0.0, 1.0] + raw_corroboration = min(weighted_sum / max(len(hits), 1), 1.0) + + # 4. Blend with the existing confidence (Bayesian-inspired update) + new_confidence = 0.4 * hyp.confidence + 0.6 * raw_corroboration + new_confidence = round(max(0.0, min(1.0, new_confidence)), 4) + + # 5. Auto-classify + new_status = hyp.status + if new_confidence >= 0.75: + new_status = "confirmed" + elif new_confidence <= 0.15 and hyp.refuting_evidence: + new_status = "refuted" + + hyp.confidence = new_confidence + hyp.status = new_status + hyp.last_evaluated_at = datetime.utcnow() + self._store.upsert(hyp, ctx) + return hyp +``` + +**Confidence scoring weights by trust level:** + +| Source Trust Level | Weight Used in Corroboration | +|--------------------|------------------------------| +| `trusted_internal` | 0.9 | +| `semi_trusted` | 0.6 | +| `untrusted_external` | 0.3 | + +**Auto-classification thresholds:** + +| Condition | New Status | +|-----------|-----------| +| `confidence ≥ 0.75` | `confirmed` | +| `confidence ≤ 0.15` AND `refuting_evidence` non-empty | `refuted` | +| Neither threshold met | Unchanged (remains `pending`) | + +#### `close()` + +Locks the hypothesis with a final analyst verdict: + +```python +def close( + self, + hypothesis_id: str, + verdict: str, # "confirmed" | "refuted" | "inconclusive" + ctx: ExecutionContext, +) -> STIXHypothesis: + """ + Closes a hypothesis with a final analyst-provided verdict. + Closed hypotheses are not eligible for further evaluate() calls. + """ + if verdict not in ("confirmed", "refuted", "inconclusive"): + raise ValueError(f"Invalid verdict: {verdict!r}") + hyp = self._store.get(hypothesis_id, STIXHypothesis) + if hyp.status in ("confirmed", "refuted", "inconclusive"): + raise HypothesisAlreadyClosedError(hypothesis_id) + hyp.status = verdict + hyp.last_evaluated_at = datetime.utcnow() + self._store.upsert(hyp, ctx) + return hyp +``` + +### Evidence Linkage via STIX Relationships + +When an analyst (or an automated pipeline) identifies a STIX object that +supports or refutes a hypothesis, a STIX `relationship` is created linking the +two objects and the relationship ID is appended to the appropriate evidence list: + +```python +# Analyst adds supporting evidence +rel = STIXRelationship( + relationship_type="supports", + source_ref=suspicious_ip.id, + target_ref=hyp.id, +) +workspace.upsert(rel, ctx) +hyp.supporting_evidence.append(rel.id) +engine.evaluate(hyp.id, ctx) # re-score with new evidence +``` + +This approach means that every piece of evidence is a first-class STIX object, +queryable, exportable as a STIX bundle, and auditable via the lineage tracker +(ADR-0038). + +### Storage + +`STIXHypothesis` is persisted via the existing `WorkspaceStore.upsert()` +mechanism. No new database tables are required; the object lands in +`workspace_objects` like any other STIX object. The idempotency key +(ADR-0041) ensures that `evaluate()` calls updating the same hypothesis do +not create duplicate rows. + +--- + +## Consequences + +### Positive + +- **Structured hypothesis lifecycle:** hypotheses transition through a defined + state machine (`pending → confirmed/refuted/inconclusive`) rather than + existing only in analyst notes; downstream SOAR and reporting systems can + filter by `status`. +- **Evidence provenance:** every piece of supporting or refuting evidence is a + typed STIX relationship, exportable as a STIX bundle and auditable via the + lineage tracker. +- **Automated corroboration:** `evaluate()` queries Solr without analyst + intervention, updating confidence as new indicators arrive in the workspace. +- **Trust-weighted scoring:** evidence from internal SIEMs carries more weight + than community feeds; this is not configurable per-call (it uses the shared + `TRUST_WEIGHTS` constant) ensuring consistent behaviour across all hypotheses. +- **No new infrastructure:** `STIXHypothesis` uses the existing workspace store + and Solr search index; no new tables, queues, or services are required. +- **Graceful Solr degradation:** if Solr is unavailable, `NullSearchIndex` is + substituted and `evaluate()` returns the hypothesis unchanged (confidence not + updated) rather than raising. + +### Negative / Trade-offs + +- **Solr dependency for corroboration:** `evaluate()` is only useful when the + Solr sidecar is running. Deployments without Solr get lifecycle management + (`propose`, `close`) but not automated corroboration. +- **Statement-based Solr query:** Solr is queried with the raw hypothesis + statement string. If the statement uses phrasing that does not match indexed + field content, corroboration scores will be low even when strong evidence + exists. Structured query decomposition (NLP-based entity extraction) is + deferred. +- **No real-time push:** `evaluate()` is called on demand or on a schedule; it + does not automatically fire when a new indicator arrives in the workspace. + A watcher pattern (deferred) would close this gap. +- **Confidence blending is heuristic:** the 40/60 blend of existing and new + confidence is not derived from a formal Bayesian model; it is a pragmatic + approximation that may need tuning. + +### Deferred + +- **Scheduled re-evaluation:** a `HypothesisWatcher` job that calls `evaluate()` + on all `pending` hypotheses when new objects are ingested into the same + workspace. +- **NLP-based entity extraction:** decompose the hypothesis statement into + structured entity queries (IP, domain, actor name) before querying Solr to + improve corroboration recall. +- **STIX 2.1 Opinion SDO integration:** map `STIXHypothesis` closed verdicts + to native STIX 2.1 `opinion` objects for maximum interoperability. +- **Multi-analyst collaboration:** allow multiple analysts to propose competing + verdicts on the same hypothesis and surface disagreements. + +--- + +## Alternatives Considered + +### Free-text analyst notes + +Keeping hypotheses as free-text entries in investigation notes was the simplest +option and required no new code. Rejected because: + +1. Notes are not machine-readable; SOAR and reporting systems cannot filter on + `status == "confirmed"`. +2. Evidence linkage is lost; the note references the evidence by name but not + by STIX ID, breaking the audit chain. +3. Confidence is not tracked; analysts must manually re-assess every note when + new evidence arrives. + +### External hypothesis management tools (e.g. Jupyter notebooks, Jira) + +Using an external tool (Jira tickets, Jupyter analysis notebooks) to track +hypotheses was considered. Rejected because it breaks GNAT's single-data-model +principle: all threat intelligence objects should be representable in STIX and +stored in the workspace. An external tool would require a synchronisation +bridge and would not benefit from Solr corroboration, lineage tracking, or the +`ReasoningEngine` scoring pipeline. + +### Native STIX 2.1 `opinion` SDO + +STIX 2.1 includes an `opinion` SDO that expresses an assessment about the +correctness of STIX content. Using `opinion` directly was considered. Rejected +because `opinion` has a fixed enumerated value set +(`strongly-disagree` to `strongly-agree`) and no fields for a natural-language +statement, a confidence score, or an evidence list. `STIXHypothesis` +(`x-gnat-hypothesis`) extends the STIX custom object pattern consistently with +ADR-0032 and can produce an `opinion` on `close()` as a derived output +(deferred). + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md b/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md new file mode 100644 index 00000000..6c886273 --- /dev/null +++ b/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md @@ -0,0 +1,364 @@ +# ADR-0043 — Negative Evidence Tracking (Phase 4C) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +When GNAT enriches a STIX observable (e.g. an IP address, a domain, a file +hash) it queries one or more connectors to retrieve additional context. If a +connector returns no results for a given observable, that **absence of data is +itself intelligence**: + +- If VirusTotal has never seen a particular file hash, that is meaningful. +- If CrowdStrike Falcon has no record of a domain, that reduces the likelihood + that the domain is a known threat actor infrastructure. +- If Recorded Future has no intelligence on an IP address, that is different + from "we have not checked yet." + +Prior to this ADR, GNAT did not record negative results. Every enrichment +request was treated as if no prior query had been made. This created two +compounding problems: + +### Problem 1: Redundant API calls + +The enrichment dispatcher re-queried every connector for every observable on +every pipeline run, regardless of whether the same lookup had already returned +nothing. In a typical production deployment: + +- 10,000 observables × 5 connectors = 50,000 queries per pipeline run +- If 60% of queries return no results, roughly 30,000 of those calls are wasted + against connectors that already had nothing to say +- Many commercial connectors enforce rate limits (e.g. VirusTotal: 500 requests + per minute on the free tier); wasted calls exhaust quota that could serve + novel indicators + +### Problem 2: No negative signal in scoring + +The `ReasoningEngine` (ADR-0044) scores observables using trust-weighted +evidence. Without a record of which connectors have been queried and returned +nothing, the engine had no way to apply a **negative penalty** to observables +that multiple reputable connectors have explicitly found unremarkable. An +observable with zero enrichment hits was treated the same as an observable that +had never been looked up — both received a neutral score rather than the +negative-evidence-adjusted lower score that a "not seen by three connectors" +result warrants. + +### Requirements + +1. Suppress redundant re-queries within a configurable time window (TTL). +2. Expose the negative result to scoring pipelines as a typed, machine-readable + object. +3. Require no new database tables or services. +4. Survive process restarts (in-memory caches do not). + +--- + +## Decision + +### `NegativeEvidenceRecord` Custom SDO + +A new custom STIX Domain Object is defined in +`gnat/stix/sdos/negative_evidence.py`: + +```python +@dataclass +class NegativeEvidenceRecord(STIXBase): + """ + x-gnat-negative-evidence — STIX custom SDO representing a confirmed + absence of data from a specific connector for a specific observable. + + Stored via the workspace store like any other STIX object; no new + tables or services required. + """ + + type: str = "x-gnat-negative-evidence" + schema_version: int = 1 + + # The STIX ID of the observable that was queried + target_ref: str = "" + + # The connector that performed the query and found nothing + queried_connector: str = "" + + # Suppression window in seconds (default: 1 hour) + ttl_seconds: int = 3600 + + # UTC timestamp of the query that returned no results + query_timestamp: datetime | None = None +``` + +#### Key Methods + +```python +def is_expired(self) -> bool: + """ + Returns True if the TTL has elapsed since query_timestamp. + An expired record does NOT suppress re-querying; a fresh record does. + """ + if self.query_timestamp is None: + return True + elapsed = (datetime.utcnow() - self.query_timestamp).total_seconds() + return elapsed > self.ttl_seconds + +def seconds_remaining(self) -> float: + """ + Returns the number of seconds before this record expires. + Returns 0.0 if already expired. + """ + if self.query_timestamp is None: + return 0.0 + elapsed = (datetime.utcnow() - self.query_timestamp).total_seconds() + return max(0.0, self.ttl_seconds - elapsed) +``` + +### Write Path: Recording a Negative Result + +When an enrichment call returns an empty result set, the enrichment dispatcher +calls `NegativeEvidenceStore.record_miss()`: + +```python +class NegativeEvidenceStore: + """ + Thin wrapper around WorkspaceStore for NegativeEvidenceRecord objects. + """ + + def record_miss( + self, + target_ref: str, + connector: str, + ctx: ExecutionContext, + ttl_seconds: int = 3600, + ) -> NegativeEvidenceRecord: + record = NegativeEvidenceRecord( + id=f"x-gnat-negative-evidence--{uuid4()}", + target_ref=target_ref, + queried_connector=connector, + ttl_seconds=ttl_seconds, + query_timestamp=datetime.utcnow(), + ) + self._store.upsert(record, ctx) + return record + + def get_fresh( + self, + target_ref: str, + connector: str, + workspace_id: str, + ) -> NegativeEvidenceRecord | None: + """ + Returns an unexpired NegativeEvidenceRecord for the given + (target_ref, connector) pair, or None if no fresh record exists. + """ + records = self._store.query( + type_filter="x-gnat-negative-evidence", + workspace_id=workspace_id, + filters={"target_ref": target_ref, "queried_connector": connector}, + ) + for record in records: + if not record.is_expired(): + return record + return None +``` + +### Read Path: Suppressing Redundant Queries + +The enrichment dispatcher checks for a fresh negative record before calling +each connector: + +```python +# gnat/ingest/enrichment.py +def _enrich_observable( + self, + observable: STIXBase, + connector: BaseClient, + ctx: ExecutionContext, +) -> list[STIXBase]: + fresh_negative = self._neg_store.get_fresh( + target_ref=observable.id, + connector=type(connector).__module__.split(".")[-2], + workspace_id=ctx.workspace_id, + ) + if fresh_negative: + logger.debug( + "Skipping %s for %s — negative evidence fresh for %.0fs", + type(connector).__name__, + observable.id, + fresh_negative.seconds_remaining(), + ) + return [] # suppress API call + + results = connector.enrich(observable, ctx) + + if not results: + self._neg_store.record_miss( + target_ref=observable.id, + connector=type(connector).__module__.split(".")[-2], + ctx=ctx, + ttl_seconds=self._ttl_seconds, + ) + + return results +``` + +### Integration with `ReasoningEngine` + +`ReasoningEngine.prioritize()` (ADR-0044) reads fresh `NegativeEvidenceRecord` +objects for each observable and applies a negative penalty to the composite +score: + +```python +# In ReasoningEngine._score_observable() +fresh_negatives = self._neg_store.query_fresh_count( + target_ref=observable.id, + workspace_id=ctx.workspace_id, +) +neg_penalty = min(0.3 * fresh_negatives, 0.6) +``` + +**Negative penalty table:** + +| Fresh Negative Records | Penalty Applied | +|------------------------|-----------------| +| 0 | 0.0 | +| 1 | 0.3 | +| 2 | 0.6 (capped) | +| 3+ | 0.6 (capped) | + +The cap at 0.6 ensures that even an observable with many negative hits retains +a non-zero score in case a trust-weighted positive hit arrives later. + +### TTL Configuration + +TTL defaults to 3600 seconds (1 hour) but is configurable per deployment in +the INI file: + +```ini +[enrichment] +negative_evidence_ttl = 3600 ; seconds; default 1 hour +``` + +Connectors that update more slowly (e.g. threat intelligence databases that +publish weekly) may benefit from a longer TTL (e.g. 86400 seconds) configured +at the connector level: + +```python +class ShadowserverClient(BaseClient): + NEGATIVE_EVIDENCE_TTL: int = 86400 # 24 hours — weekly update cadence +``` + +`NegativeEvidenceStore.record_miss()` reads `NEGATIVE_EVIDENCE_TTL` from the +connector class when present, falling back to the INI-configured default. + +--- + +## Consequences + +### Positive + +- **Quota preservation:** redundant queries are suppressed within the TTL + window, directly reducing API call volume. In a deployment with 10,000 + observables and 60% miss rate, suppression across a 1-hour window reduces + repeat calls from 30,000 to near-zero during replays and subsequent runs. +- **Richer scoring:** the `ReasoningEngine` can now distinguish between + "unknown" and "confirmed not seen by N connectors," producing lower scores for + observables that multiple reputable connectors have explicitly found + unremarkable. +- **Persistence across restarts:** `NegativeEvidenceRecord` is stored in the + workspace like any other STIX object; suppression survives process restarts, + unlike an in-memory cache. +- **Zero new infrastructure:** no new tables, queues, message brokers, or + caching services are required. The existing workspace store handles + persistence; the existing query interface handles retrieval. +- **First-class STIX object:** negative evidence is exportable as part of a + STIX bundle, shareable between workspaces, and auditable via the lineage + tracker (ADR-0038). + +### Negative / Trade-offs + +- **Workspace store growth:** every enrichment miss creates a + `NegativeEvidenceRecord` object. A deployment with 10,000 observables + queried against 5 connectors creates up to 50,000 records per TTL window. + A cleanup job (see Deferred) is needed to purge expired records. +- **TTL is a blunt instrument:** a 1-hour TTL is appropriate for live threat + feeds but too short for weekly-updated databases and too long for real-time + feeds that update every minute. The per-connector `NEGATIVE_EVIDENCE_TTL` + class variable partially addresses this, but it requires connector authors to + reason about update cadence. +- **No invalidation on connector update:** if a connector's data is known to + have been refreshed (e.g. the operator manually triggers a full re-sync), the + TTL-based suppression cannot be invalidated without deleting all matching + `NegativeEvidenceRecord` objects. Manual invalidation is not yet tooled. +- **False negative suppression:** if a connector initially returns no results + but adds the indicator to its database within the TTL window, GNAT will not + re-query until the TTL expires, missing the new data. + +### Deferred + +- **Expired record cleanup job:** a scheduled `NegativeEvidencePurgeJob` that + deletes `NegativeEvidenceRecord` objects whose TTL has elapsed, preventing + unbounded workspace store growth. +- **Per-observable TTL override:** allow analysts to set a shorter TTL on + high-priority observables that should be re-queried more aggressively. +- **Manual invalidation API:** `gnat enrich invalidate-negative ` CLI + command to force re-querying by deleting all matching negative records. +- **Sharing across workspaces:** allow a negative evidence record in one + workspace to suppress queries in a sibling workspace, reducing redundant calls + in multi-tenant deployments. + +--- + +## Alternatives Considered + +### In-memory LRU cache + +An in-process `functools.lru_cache` or `cachetools.TTLCache` keyed on +`(observable_id, connector_name)` was the simplest implementation. Rejected +because: + +1. **Lost on restart:** a cache flush caused by a container restart or worker + crash would cause all missed queries to be re-issued, negating the quota + savings on the very occasions when pipelines are most likely to be + re-run (crash recovery). +2. **Not shared across workers:** in a multi-worker deployment each worker + maintains an independent cache; a negative result learned by Worker A is not + known to Worker B. +3. **Not auditable:** the `ReasoningEngine` cannot query an in-memory cache for + the negative penalty calculation without tight coupling between the scoring + engine and the enrichment dispatcher's runtime state. + +### Connector-side rate limiting + +Relying on each connector's own rate limiter to prevent redundant calls was +considered. Rejected because: + +1. Rate limiters enforce a maximum call *rate*, not a minimum interval between + identical calls. A rate limiter allows 500 calls/minute but does not prevent + querying the same observable 500 times in a minute. +2. Rate limiters are applied globally per connector, not per observable. They + do not suppress re-querying a specific observable that already returned no + results. +3. Rate limiters do not expose negative signal to the scoring pipeline. + +### Extending `EnrichmentLogModel` + +The existing `EnrichmentLogModel` (which records enrichment operations) could +have been extended with a `result_count: int` column so that a query returning +0 results is distinguishable from one not yet performed. + +Rejected because: + +1. `EnrichmentLogModel` is an append-only audit log, not a queryable state + store; answering "is there a fresh negative result for (X, connector)?" + would require a `MAX(timestamp)` query with a join, adding complexity. +2. `EnrichmentLogModel` is not a STIX object and is therefore not shareable via + STIX bundles or exportable to partner workspaces. +3. The existing lineage event model (ADR-0038) serves the audit function; + negative evidence requires a separate, queryable state representation. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md b/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md new file mode 100644 index 00000000..ed5854c3 --- /dev/null +++ b/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md @@ -0,0 +1,464 @@ +# ADR-0044 — Evidence-Weighted Observable Reasoning Engine (Phase 4C) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT ingests thousands of STIX observables per pipeline run from dozens of +connectors. Prior to this ADR, analysts had no automated mechanism to answer +the question: **"Given everything GNAT knows right now, which of these +observables should I investigate first?"** + +The existing confidence scoring (ADR-0033) assigned a single confidence value +per object based on connector-reported metadata. This was insufficient for +prioritisation because: + +1. **Single signal:** confidence came from one field on one object, ignoring the + object's age, corroborating hits across other objects in the workspace, and + negative evidence from connectors that had never seen the observable. +2. **Trust-agnostic:** a 0.9-confidence hit from AlienVault OTX (open community + submissions) and a 0.9-confidence hit from the organisation's own Splunk + deployment were scored identically, despite the profound difference in source + authority. +3. **Not explainable:** a single float score gave analysts no insight into why + an observable was scored high or low; it could not be audited. +4. **Not persisted:** scores were computed on demand and discarded; there was no + record that prioritisation had occurred, breaking the lineage chain. + +The `HypothesisEngine` (ADR-0042) and `NegativeEvidenceRecord` (ADR-0043) +introduced structured evidence objects that begged for a consumer: a scoring +engine that reads them and produces a ranked, explainable prioritisation list. + +SOC analyst feedback collected during Phase 4B identified three signals as most +valuable for triage prioritisation: + +- **Source authority** (whose data is this?) +- **Recency** (how recently was this observed or updated?) +- **Corroboration** (how many other data points mention this observable?) + +A fourth signal — **absence of data** — was identified as equally important: +an observable not seen by any trusted connector is less urgent than one +confirmed by three. + +--- + +## Decision + +### `ReasoningEngine` + +The scoring engine is defined in `gnat/reasoning/engine.py`: + +```python +class ReasoningEngine: + """ + Prioritises a set of STIX observables using a composite evidence-weighted + score derived from trust level, age, Solr corroboration, and negative + evidence penalties. + + Parameters + ---------- + store : WorkspaceStore + Workspace store used to persist STIX note objects when store_notes=True. + search_index : SearchIndex + Solr search index for corroboration queries. Pass NullSearchIndex when + Solr is unavailable; the engine degrades gracefully. + neg_store : NegativeEvidenceStore + Store for fresh NegativeEvidenceRecord lookups. + trust_weights : dict[str, float] | None + Override for the default TRUST_WEIGHTS mapping. Pass None to use + the shared constant from gnat.core.trust. + """ + + def __init__( + self, + store: WorkspaceStore, + search_index: SearchIndex, + neg_store: NegativeEvidenceStore, + trust_weights: dict[str, float] | None = None, + ) -> None: + self._store = store + self._search = search_index + self._neg = neg_store + self._weights = trust_weights or TRUST_WEIGHTS +``` + +### `prioritize()` + +The primary public method: + +```python +def prioritize( + self, + observable_set: list[STIXBase], + ctx: ExecutionContext, + store_notes: bool = True, +) -> list[tuple[STIXBase, float, dict]]: + """ + Score and rank a list of STIX observables. + + Parameters + ---------- + observable_set : list[STIXBase] + The observables to score. All must belong to ctx.workspace_id. + ctx : ExecutionContext + Execution context; trust_level and workspace_id are read from here. + store_notes : bool + When True, persist a STIX note object for each scored observable + recording the score breakdown. Defaults to True. + + Returns + ------- + list[tuple[STIXBase, float, dict]] + Triples of (observable, score, explanation), sorted by score descending. + score is in [0.0, 1.0]. explanation is a machine-readable dict. + """ + results = [] + for obs in observable_set: + score, explanation = self._score_observable(obs, ctx) + if store_notes: + self._persist_note(obs, score, explanation, ctx) + results.append((obs, score, explanation)) + + results.sort(key=lambda t: t[1], reverse=True) + return results +``` + +### Composite Scoring Formula + +``` +score = trust_weight × 0.4 + + age_factor × 0.3 + + corroboration_bonus × 0.3 + − neg_penalty × 0.5 +``` + +The result is clamped to `[0.0, 1.0]`. + +#### Component Definitions + +**`trust_weight`** — derived from `ExecutionContext.trust_level`: + +| Trust Level | trust_weight | +|-------------|-------------| +| `trusted_internal` | 0.9 | +| `semi_trusted` | 0.6 | +| `untrusted_external` | 0.3 | + +The context trust level represents the highest-authority source in the pipeline +that produced or enriched this observable. + +**`age_factor`** — time-decay from the observable's `modified` field: + +```python +def _age_factor(self, obs: STIXBase) -> float: + if obs.modified is None: + return 0.5 # no timestamp: neutral decay + days_old = (datetime.utcnow() - obs.modified).total_seconds() / 86400.0 + return max(0.0, 1.0 - 0.05 * days_old) +``` + +| Age (days) | age_factor | +|-----------|-----------| +| 0 (today) | 1.00 | +| 1 | 0.95 | +| 5 | 0.75 | +| 10 | 0.50 | +| 20 | 0.00 (floor) | + +**`corroboration_bonus`** — Solr hit count for the observable's identifier fields: + +```python +def _corroboration_bonus(self, obs: STIXBase) -> float: + hits = self._search.query( + obs.name or obs.id, + fields=["name", "pattern", "value", "description"], + ) + return min(len(hits) * 0.05, 0.25) +``` + +| Solr Hits | corroboration_bonus | +|-----------|-------------------| +| 0 | 0.00 | +| 1 | 0.05 | +| 3 | 0.15 | +| 5+ | 0.25 (cap) | + +**`neg_penalty`** — count of unexpired `NegativeEvidenceRecord` objects for +this observable: + +```python +def _neg_penalty(self, obs: STIXBase, workspace_id: str) -> float: + count = self._neg.query_fresh_count( + target_ref=obs.id, + workspace_id=workspace_id, + ) + return min(0.3 * count, 0.6) +``` + +| Fresh Negative Records | neg_penalty | +|------------------------|------------| +| 0 | 0.00 | +| 1 | 0.30 | +| 2+ | 0.60 (cap) | + +The cap at 0.60 applied via the `× 0.5` formula coefficient means the maximum +negative penalty subtracted from the composite score is `0.60 × 0.5 = 0.30`, +preserving a floor above zero even for heavily negatively-evidenced observables. + +### Full Scoring Implementation + +```python +def _score_observable( + self, + obs: STIXBase, + ctx: ExecutionContext, +) -> tuple[float, dict]: + tw = self._weights.get(ctx.trust_level, 0.6) + af = self._age_factor(obs) + cb = self._corroboration_bonus(obs) + np_ = self._neg_penalty(obs, ctx.workspace_id) + + raw = tw * 0.4 + af * 0.3 + cb * 0.3 - np_ * 0.5 + score = round(max(0.0, min(1.0, raw)), 4) + + explanation = { + "score": score, + "components": { + "trust_weight": tw, + "trust_weight_coeff": 0.4, + "age_factor": af, + "age_factor_coeff": 0.3, + "corroboration_bonus": cb, + "corroboration_coeff": 0.3, + "neg_penalty": np_, + "neg_penalty_coeff": 0.5, + }, + "trust_level": ctx.trust_level, + "workspace_id": ctx.workspace_id, + "evaluated_at": datetime.utcnow().isoformat(), + } + return score, explanation +``` + +### Explanation Dict Structure + +The `explanation` dict is machine-readable, not free text, so that downstream +components (report generators, SOAR connectors, TUI) can format it as needed: + +```json +{ + "score": 0.6250, + "components": { + "trust_weight": 0.9, + "trust_weight_coeff": 0.4, + "age_factor": 0.75, + "age_factor_coeff": 0.3, + "corroboration_bonus": 0.15, + "corroboration_coeff": 0.3, + "neg_penalty": 0.0, + "neg_penalty_coeff": 0.5 + }, + "trust_level": "trusted_internal", + "workspace_id": "acme-corp", + "evaluated_at": "2026-04-09T14:23:01.000Z" +} +``` + +### STIX Note Persistence + +When `store_notes=True`, the engine persists a STIX 2.1 `note` object for each +scored observable: + +```python +def _persist_note( + self, + obs: STIXBase, + score: float, + explanation: dict, + ctx: ExecutionContext, +) -> None: + note = STIXNote( + id=f"note--{uuid4()}", + abstract=f"ReasoningEngine score: {score:.4f}", + content=json.dumps(explanation, indent=2), + object_refs=[obs.id], + created_by_ref=ctx.initiated_by, + ) + self._store.upsert(note, ctx) +``` + +STIX `note` objects link to their target via `object_refs`, making the +score and explanation auditable via the standard STIX relationship graph +and exportable in STIX bundles. + +### Solr Degradation + +When Solr is unavailable, `NullSearchIndex` is substituted: + +```python +class NullSearchIndex(SearchIndex): + """No-op search index used when Solr is unavailable.""" + + def query(self, query: str, fields: list[str] | None = None) -> list[dict]: + return [] +``` + +With `NullSearchIndex`, `corroboration_bonus` is always 0.0. The engine +continues to score using `trust_weight`, `age_factor`, and `neg_penalty`, +producing a degraded but still useful ranking. + +### Usage Example + +```python +from gnat.reasoning.engine import ReasoningEngine +from gnat.search import GNATIndexer +from gnat.core.context import ExecutionContext + +ctx = ExecutionContext.from_connector( + connector=splunk_client, + domain="analysis", + workspace_id="acme-corp", +) + +engine = ReasoningEngine( + store=workspace_store, + search_index=GNATIndexer.from_config(config), + neg_store=neg_evidence_store, +) + +ranked = engine.prioritize( + observable_set=all_indicators, + ctx=ctx, + store_notes=True, +) + +for obs, score, explanation in ranked[:10]: + print(f"{score:.4f} {obs.name or obs.id}") + # > 0.7800 192.0.2.1 + # > 0.6550 evil-domain.example.com + # > 0.4200 suspicious-hash-abc123 +``` + +--- + +## Consequences + +### Positive + +- **Deterministic and reproducible:** given the same inputs (trust level, object + timestamps, Solr hit counts, negative records), the formula always produces + the same score. This makes it testable with fixed fixtures and auditable + after the fact. +- **Explainable:** the structured `explanation` dict exposes every scoring + component; analysts can see exactly why an observable ranked high or low + without reading source code. +- **Fully auditable:** STIX `note` objects link scores to observables in the + standard STIX graph; the entire prioritisation history is queryable and + exportable. +- **Solr-optional:** `NullSearchIndex` allows the engine to operate in minimal + deployments (developer workstations, CI) without a Solr sidecar, with only + the corroboration component degraded. +- **Composable:** the scoring formula uses components already computed by + `NegativeEvidenceStore` and `ExecutionContext`; no new data collection is + needed beyond what Phase 4C already produces. +- **No new dependencies:** all components are pure Python dataclass operations + plus existing Solr and SQLAlchemy infrastructure; no new packages are required. + +### Negative / Trade-offs + +- **Context trust level is pipeline-level:** `trust_weight` is read from the + `ExecutionContext`, which represents the trust of the pipeline that ingested + the observable, not the trust of each individual source that contributed to + the enrichment. An observable enriched by both Splunk (trusted_internal) and + AlienVault (untrusted_external) in different pipeline runs will be scored + differently depending on which pipeline context `prioritize()` is called with. + Per-observable trust aggregation is deferred. +- **Age factor assumes `modified` is reliable:** not all connectors reliably + populate the STIX `modified` field; objects with no `modified` receive the + neutral 0.5 factor, which may over- or under-rank them depending on their + actual age. +- **Corroboration bonus is hit-count-based:** the Solr query returns a count of + matching documents, not a measure of the quality or relevance of those + matches. A high Solr hit count on a generic observable (e.g. a popular CDN + IP) may inflate the bonus. +- **Score storage growth:** with `store_notes=True`, every call to `prioritize()` + on N observables creates N STIX note objects. Regular re-prioritisation + (e.g. on a daily schedule) accumulates many notes per observable. A retention + policy is needed. + +### Deferred + +- **Per-observable trust aggregation:** compute the effective trust weight from + all connectors that have enriched the observable (max, weighted average, or + union) rather than from the pipeline-level `ExecutionContext`. +- **ML-based weight calibration:** collect analyst feedback on scored results + (accepted/rejected triage decisions) and use them to calibrate the formula + coefficients (`0.4`, `0.3`, `0.3`, `0.5`) via a regression model. +- **Score note retention policy:** a `ScoreNotePurgeJob` that deletes note + objects older than a configurable threshold, retaining only the most recent + score per observable. +- **TUI prioritisation dashboard:** display the ranked observable list with + expandable `explanation` views in the Textual TUI. +- **Streaming prioritisation:** emit score updates as new evidence arrives via + the HookBus rather than requiring explicit `prioritize()` calls. + +--- + +## Alternatives Considered + +### ML-based ranking (deferred, not rejected) + +A supervised ranking model trained on analyst triage decisions was the +originally proposed approach. It was deferred (not rejected) because: + +1. GNAT does not yet have labelled training data (analyst accept/reject + decisions on scored observables); the formula-based engine will collect this + data in production. +2. An ML model is harder to explain and audit; the formula produces an + `explanation` dict that every component of the system can parse. +3. ML models require a training pipeline, model versioning, and serving + infrastructure that are out of scope for Phase 4C. + +The formula-based engine is explicitly designed to be replaceable: the scoring +logic is isolated in `_score_observable()`, and the coefficients are named +constants that a future calibration layer can tune without changing the public +API. + +### Flat confidence score only + +Retaining the Phase 3 single-field confidence score and not introducing a +multi-component formula was the minimal alternative. Rejected because: + +1. It ignores trust authority (source reliability) — the single most important + factor identified in analyst feedback. +2. It ignores recency — a 1-year-old hit is less actionable than a hit from + today. +3. It has no mechanism to penalise observables that multiple connectors have + already examined and found unremarkable. +4. It is not explainable — analysts cannot determine why an observable ranked + above another. + +### Graph-centrality ranking + +Using the STIX relationship graph to compute centrality scores (e.g. PageRank +over the STIX `relationship` graph) as the primary ranking signal was +considered. Rejected because: + +1. GNAT workspaces in early deployments may have sparse relationship graphs; + centrality degrades to random ranking for isolated observables. +2. Graph traversal over potentially 100,000+ STIX objects requires significant + compute and is not suitable for on-demand scoring within a pipeline run. +3. Centrality does not incorporate trust authority, recency, or negative + evidence without substantial additional engineering. + +Graph-based ranking remains a viable long-term complement to the formula and +may be reintroduced as an optional corroboration signal once workspaces have +sufficient relationship density. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md b/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md new file mode 100644 index 00000000..e4d367cd --- /dev/null +++ b/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md @@ -0,0 +1,277 @@ +# ADR-0045 — Agent Governance Layer (Phase 4D) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT's AI agent layer (`gnat/agents/`) had grown substantially through Phases 3 and 4 to include +`ResearchAgent`, `ParsingAgent`, `CopilotReader`, and a family of workflow and quality agents. +Each of these agents can invoke connector actions — fetching threat intelligence, enriching +indicators, exporting STIX bundles, and triggering SOAR playbooks. + +As agents gained write access, two serious gaps emerged: + +1. **No permission system.** Any agent could call any connector action regardless of its origin or + the sensitivity of the target workspace. A `ParsingAgent` used in an untrusted enrichment + pipeline had the same effective privileges as an internally authored `ResearchAgent`. + +2. **No audit trail.** Agent-originated writes were indistinguishable in the enrichment log from + direct analyst operations. When an indicator was modified by an agent, there was no record of + which agent did it, under what context, or whether any human had authorised the change. + +The absence of a governance layer made agent deployments unsuitable for production environments +with compliance requirements (SOC 2, ISO 27001, MSSPs serving regulated verticals). Operators +had no mechanism to restrict, monitor, or rate-limit agent activity. + +--- + +## Decision + +Introduce an **`AgentGovernor`** as the authoritative policy enforcement point for all agent +actions in GNAT. Every agent action must pass through the governor before it may execute. + +### `AgentActionType` Enum + +Ten action types covering the full range of agent-reachable operations: + +| Action Type | Description | +|---|---| +| `read_stix` | Read STIX objects from a connector or workspace | +| `write_stix` | Create or update STIX objects | +| `delete_stix` | Soft-delete STIX objects | +| `enrich` | Call enrichment dispatcher against existing objects | +| `ingest` | Run an ingest pipeline or reader | +| `export` | Trigger an export (EDL, STIX bundle, Netskope CE) | +| `trigger_playbook` | Invoke an XSOAR or external SOAR playbook | +| `manage_workspace` | Create, rename, or delete a workspace | +| `escalate` | Route a finding to the review queue or analyst channel | +| `hypothesize` | Generate AI hypotheses (read-only, no state mutation) | + +### Trust Levels + +Three trust levels applied to every agent at registration time: + +| Trust Level | Description | +|---|---| +| `trusted_internal` | Internally authored agents, admin-signed, registry-registered | +| `semi_trusted` | Third-party or plugin agents loaded at runtime | +| `untrusted_external` | Externally supplied agents (research pipeline agents, unverified) | + +### Default Permission Matrix + +``` + trusted_internal semi_trusted untrusted_external +read_stix ✓ ✓ ✓ +write_stix ✓ ✓ ✗ +delete_stix ✓ ✗ ✗ +enrich ✓ ✓ ✓ +ingest ✓ ✓ ✗ +export ✓ ✗ ✗ +trigger_playbook ✓ ✗ ✗ +manage_workspace ✓ ✗ ✗ +escalate ✓ ✓ ✓ +hypothesize ✓ ✓ ✓ +``` + +### `AgentAction` Dataclass + +Immutable record created for every checked action, whether approved or denied: + +```python +@dataclass +class AgentAction: + action_id: str # UUID4 + agent_id: str # registered agent identifier + action_type: AgentActionType + target_ref: str # STIX ID or connector name of the target + impact_level: str # "low" | "medium" | "high" | "critical" + session_id: str # owning agent session UUID + context_id: str | None # workspace or execution context name + result_json: str # JSON-encoded outcome or error + approved_by: str | None # reviewer ID for HITL-approved actions + submitted_at: datetime + executed_at: datetime | None + status: str # "pending" | "approved" | "denied" | "executed" | "failed" +``` + +### `AgentGovernor` API + +Located at `gnat/agents/governor.py`: + +```python +from gnat.agents.governor import AgentGovernor, AgentActionType + +governor = AgentGovernor() + +# Check permission — returns True/False +governor.can_act( + agent_id="research-agent-v2", + action_type=AgentActionType.write_stix, + trust_level="semi_trusted", +) + +# Assert permission — raises AgentPermissionDenied if denied +governor.require_can_act( + agent_id="research-agent-v2", + action_type=AgentActionType.export, + trust_level="semi_trusted", +) + +# Record a completed action +governor.record_action(action) + +# Sliding-window rate limit — raises RateLimitExceeded on breach +governor.rate_limit_check( + agent_id="research-agent-v2", + window_seconds=3600, # configurable per agent +) + +# Query audit log +log = governor.get_action_log(agent_id="research-agent-v2") +all_actions = governor.get_action_log() # all agents + +# Runtime policy override — persists for the process lifetime +governor.set_policy_override( + agent_id="custom-agent", + action_type=AgentActionType.export, + allowed=True, +) +``` + +### Exceptions + +```python +from gnat.agents.governor import AgentPermissionDenied, RateLimitExceeded + +# AgentPermissionDenied(agent_id, action_type, trust_level, reason) +# RateLimitExceeded(agent_id, window_seconds, call_count, limit) +``` + +Both inherit from `GNATClientError` so they are caught by the standard error handling path. + +### HookBus Integration + +`record_action()` emits a `"agent_action_recorded"` event on the global `HookBus` after +persisting to the in-memory audit log. Operators can subscribe to receive real-time action +events for external SIEM forwarding: + +```python +from gnat.agents.governor import AgentGovernor +from gnat.context import HookBus + +bus = HookBus.get_default() +bus.subscribe("agent_action_recorded", lambda evt: siem_client.send(evt)) +``` + +### Database Schema + +Two new tables added via Alembic migration `0006_add_agent_governance.py`: + +**`agent_sessions`** + +| Column | Type | Notes | +|---|---|---| +| `id` | `VARCHAR(36)` | UUID4 primary key | +| `agent_id` | `VARCHAR(200)` | registered agent identifier | +| `trust_level` | `VARCHAR(50)` | one of the three trust levels | +| `context_id` | `VARCHAR(200)` | workspace or execution context | +| `started_at` | `DATETIME` | UTC | +| `ended_at` | `DATETIME` | nullable | +| `action_count` | `INTEGER` | incremented on each `record_action()` | +| `policy_overrides_json` | `TEXT` | JSON map of per-agent overrides active at session start | + +**`agent_actions`** + +| Column | Type | Notes | +|---|---|---| +| `id` | `VARCHAR(36)` | UUID4 primary key | +| `session_id` | `VARCHAR(36)` | FK → `agent_sessions.id` | +| `agent_id` | `VARCHAR(200)` | denormalised for query convenience | +| `action_type` | `VARCHAR(50)` | enum value | +| `target_ref` | `VARCHAR(500)` | STIX ID or connector name | +| `impact_level` | `VARCHAR(20)` | `low` / `medium` / `high` / `critical` | +| `status` | `VARCHAR(20)` | lifecycle status | +| `approved_by` | `VARCHAR(200)` | nullable | +| `result_json` | `TEXT` | outcome payload | +| `submitted_at` | `DATETIME` | UTC | +| `executed_at` | `DATETIME` | nullable | + +Composite index on `(agent_id, submitted_at)` for time-range queries on a single agent. + +--- + +## Consequences + +### Positive + +- **Least-privilege enforcement:** agents that do not need write access cannot obtain it + regardless of the code paths they call; the permission matrix is the single source of truth. +- **Immutable audit trail:** every agent action — approved or denied — is recorded with full + context, making compliance evidence generation straightforward. +- **Rate limiting prevents runaway agents:** a misconfigured `ResearchAgent` with + `max_calls_per_run=9999` will be stopped by the sliding-window counter before it exhausts + API quota on a connected platform. +- **Per-deployment customisation:** `set_policy_override()` lets operators grant or restrict + individual agents at runtime without a code change — important for MSP deployments where + customer-specific agents need tailored permissions. +- **HookBus integration enables SIEM forwarding** at zero additional cost to the caller. + +### Negative / Trade-offs + +- **Slight performance overhead:** every agent action incurs a permission check and an audit + log write. For high-frequency ingest agents this adds a small but measurable latency. +- **In-memory rate limit counter:** the sliding-window counter resets on process restart. + Distributed deployments where multiple GNAT workers serve the same agent pool should + configure an external Redis counter (deferred, see below). +- **Policy matrix is static at import time:** the default permission matrix is a module-level + dict; runtime overrides apply only to the running process. Multi-process deployments must + configure overrides identically on each worker or use the shared DB override table. + +### Deferred + +- Distributed rate limiting via Redis sidecar +- Per-action approval workflow (short-circuited in Phase 4D by `HITLGateway` — see ADR-0046) +- Agent registry with cryptographic signing of agent identity +- Capability-based security tokens as an alternative to trust-level categories + +--- + +## Alternatives Considered + +### Capability-Based Security Tokens + +Each agent would hold a signed token listing specific capabilities (analogous to OAuth2 scopes). +Token validation would replace the trust-level lookup. This model is more granular and suitable +for multi-organisation federation, but is significantly more complex to implement and operate — +particularly for the embedded agents that run inside the same process as the pipeline. It was +deferred as a future evolution once agent federation becomes a firm requirement. + +### OAuth2 Scopes Per Agent + +Define a fixed set of OAuth2 scopes (`gnat:read`, `gnat:write`, `gnat:export`, etc.) and issue +per-agent tokens from a lightweight authorization server. Rejected because it introduces an +external service dependency for what is currently a single-process feature. The scope model will +be revisited if GNAT ever exposes its agent layer over a network boundary. + +### Audit Logging Only (No Permission Enforcement) + +Log all agent actions but do not block anything. Rejected because post-hoc detection of +unauthorised agent writes is insufficient for regulated environments — damage may occur before +the audit log is reviewed. The prevention-first model of `require_can_act()` is the correct +default; audit logging is the secondary safeguard. + +### Connector-Level Guards Only + +Apply permission checks at the connector's `upsert_object()` / `delete_object()` entry points +rather than in a centralised governor. Rejected because it requires every connector +implementation to carry governance logic, creates inconsistent enforcement across the 99 +connectors, and cannot easily support cross-cutting policies such as rate limiting and HookBus +emission. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md b/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md new file mode 100644 index 00000000..7ba5d07a --- /dev/null +++ b/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md @@ -0,0 +1,298 @@ +# ADR-0046 — Human-in-the-Loop Gateway (Phase 4D) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT's AI agents can now be granted write, export, playbook-trigger, and +workspace-management permissions via `AgentGovernor` (ADR-0045). For most +trust levels and action types the governor's permission matrix is sufficient: +either the action is allowed and it executes immediately, or it is denied +outright. + +However, a subset of agent actions are high-impact enough that neither +automatic approval nor outright denial is the correct policy: + +- Triggering an XSOAR playbook against a live environment carries irreversible + side effects (firewall rule changes, endpoint isolation, ticket creation). +- Workspace deletions or bulk STIX deletions are difficult to roll back. +- Escalation decisions that route findings to an incident team should carry an + auditable human sign-off. + +Prior to this ADR there was no mechanism to **pause** an agent action and hold +it in a review queue until a human operator approved or rejected it. The +existing `gnat/review/` module contained a fully implemented `ReviewService` +and `ReviewQueueStore`, but they were reachable only from the report lifecycle +(ADR-0034); agents had no bridge to that infrastructure. + +The result was an all-or-nothing choice: either grant agents unrestricted +write access, or block the action class entirely. Neither option is suitable +for production deployments where agents need occasional high-impact capability +under controlled conditions. + +--- + +## Decision + +Introduce **`HITLGateway`** (`gnat/agents/hitl.py`) as a thin policy bridge +between `AgentGovernor` and `gnat/review/service.py`. Every agent action +evaluated by `AgentGovernor.require_can_act()` is additionally evaluated by +`HITLGateway.evaluate()` before it may execute. + +### Impact Tier Classification + +Impact level is a field on `AgentAction` (see ADR-0045) set by the agent at +action creation time. `HITLGateway` routes on that field: + +| Impact Level | Routing Policy | Review Queue Entry | +|---|---|---| +| `low` | Auto-approved, execution proceeds immediately | None (logged only) | +| `medium` | Auto-approved, execution proceeds immediately | None (logged only) | +| `high` | Blocked pending human approval via `ReviewService` | `PENDING` `ReviewItem` created | +| `critical` | Blocked pending human approval; XSOAR playbook notification sent | `PENDING` `ReviewItem` created + XSOAR alert | + +### `HITLGateway` API + +Located at `gnat/agents/hitl.py`: + +```python +from gnat.agents.hitl import HITLGateway +from gnat.agents.governor import AgentAction, AgentActionType + +gateway = HITLGateway() + +# Primary entry point — called by AgentGovernor after permission check passes +approved, review_item = gateway.evaluate(action) +if not approved: + # action is PENDING; agent should poll or await human decision + print(f"Action {action.action_id} awaiting review: {review_item.id}") + +# Submit a specific action to the review queue explicitly +review_item = gateway.submit_for_approval(action) + +# Poll queue for a decision +from gnat.review.service import ReviewStatus +status = gateway.check_approval_status(review_item.id) +# status is one of ReviewStatus.PENDING, APPROVED, REJECTED + +# Auto-approve (used in test harnesses and auto-escalation policies) +gateway.auto_approve_pending(review_item.id, reviewer="auto-policy") +``` + +### `evaluate()` Logic + +```python +def evaluate( + self, action: AgentAction +) -> tuple[bool, ReviewItem | None]: + if action.impact_level in ("low", "medium"): + self._log_auto_approved(action) + return True, None + + review_item = self.submit_for_approval(action) + + if action.impact_level == "critical": + self._notify_xsoar(action, review_item) + + return False, review_item +``` + +The action is **blocked** (returns `False`) for `high` and `critical` levels +regardless of the trust level of the agent. Even a `trusted_internal` agent +must pause for a human reviewer if its action carries `impact_level="critical"`. + +### `submit_for_approval()` — ReviewService Bridge + +`submit_for_approval()` converts the `AgentAction` dataclass into a +STIX-compatible metadata dict and delegates to `ReviewService.submit()`: + +```python +def submit_for_approval(self, action: AgentAction) -> ReviewItem: + payload = { + "type": "agent-action-review", + "action_id": action.action_id, + "agent_id": action.agent_id, + "action_type": action.action_type.value, + "target_ref": action.target_ref, + "impact_level": action.impact_level, + "context_id": action.context_id, + "submitted_at": action.submitted_at.isoformat(), + } + return self._review_service.submit( + item_type="agent_action", + payload=payload, + submitter=action.agent_id, + priority="high" if action.impact_level == "critical" else "normal", + ) +``` + +No new storage is introduced — `ReviewItem` and `ReviewQueueStore` from +`gnat/review/` are used as-is. + +### Approval Timeout + +`check_approval_status()` enforces a configurable timeout: + +```python +def check_approval_status(self, review_id: str) -> ReviewStatus: + item = self._review_service.get(review_id) + elapsed = (datetime.utcnow() - item.submitted_at).total_seconds() + if ( + item.status == ReviewStatus.PENDING + and elapsed > self._approval_timeout_seconds + ): + self._review_service.reject( + review_id, + reason="auto-rejected: approval timeout exceeded", + reviewer="hitl-gateway", + ) + return ReviewStatus.REJECTED + return item.status +``` + +Default `approval_timeout_seconds` is `3600` (one hour). Configurable via the +`[agents]` INI section: + +```ini +[agents] +hitl_approval_timeout_seconds = 3600 +hitl_xsoar_playbook_id = P-GNAT-AGENT-ALERT +``` + +### XSOAR Notification for Critical Actions + +For `critical` impact actions, `HITLGateway` calls the XSOAR connector's +`upsert_object()` with a pre-formed STIX `incident` custom object: + +```python +def _notify_xsoar( + self, action: AgentAction, review_item: ReviewItem +) -> None: + incident = { + "type": "x-gnat-incident", + "name": f"HITL Review Required: {action.action_type.value}", + "severity": "high", + "agent_id": action.agent_id, + "action_id": action.action_id, + "review_id": review_item.id, + "target_ref": action.target_ref, + } + try: + self._xsoar_client.upsert_object(incident) + except Exception as exc: + # Notification failure must never block the review queue entry + logger.warning("XSOAR notification failed: %s", exc) +``` + +The XSOAR client is a `trusted_internal` connector instance constructed from +the INI `[xsoar]` section. If XSOAR is not configured, the notification is +skipped and a warning is logged; the `ReviewItem` is still created. + +### Sequence Diagram + +``` +Agent AgentGovernor HITLGateway ReviewService + | | | | + |── require_can_act() ──► | | | + | |── evaluate(action) ► | | + | | |── submit() ────────►| + | | |◄── ReviewItem ──────| + | | | | + | | [critical only] | | + | | |── _notify_xsoar() | + | | | (XSOARClient) | + | |◄── (False, item) ────| | + |◄── AgentActionPending ──| | | + | | | | + | [human approves] | | | + |── check_approval() ─────────────────────────► |── get(review_id) ──►| + |◄── APPROVED ─────────────────────────────────── |◄── ReviewStatus ───| +``` + +--- + +## Consequences + +### Positive + +- **No runaway high-impact actions:** agents cannot execute playbook triggers, + workspace deletions, or bulk STIX writes without a human in the loop, + regardless of their trust level. +- **Zero new storage infrastructure:** the review queue already existed in + `gnat/review/`. `HITLGateway` is a pure orchestration layer with no new + tables or persistence concerns. +- **XSOAR users receive actionable alerts:** operators who rely on XSOAR as + their SOAR console see critical agent actions appear as incidents immediately, + without requiring a separate notification integration. +- **Timeout prevents indefinite blocking:** auto-rejection after one hour + ensures that a missed review does not permanently block an agent session. +- **Testable in isolation:** `HITLGateway` accepts a `review_service` and + `xsoar_client` in its constructor, enabling full injection of test doubles. + +### Negative / Trade-offs + +- **Agents must poll or wait for approval:** there is no push-based callback + mechanism. Agents that need a fast response for `high`-impact actions must + implement a polling loop or be designed to suspend and resume. +- **Timeout is process-local:** the timeout check runs inside + `check_approval_status()`, which the agent must call. If the agent process + restarts, in-flight pending reviews are not automatically expired; a + background sweep task is needed for production deployments (deferred). +- **Single XSOAR integration point:** critical notifications only reach XSOAR + in this implementation. Other SOAR platforms (Splunk SOAR, Palo Alto XSIAM) + require additional notification adapters (deferred). + +### Deferred + +- Background sweep task to expire timed-out `PENDING` reviews independently of + agent polling +- Multi-SOAR notification adapters (Splunk SOAR, Tines, Torq) +- Webhook-based push approval for non-XSOAR environments (e.g. Slack approval + buttons via the Discord/Slack connectors) +- Role-based approval routing: routing `critical` actions to a named reviewer + group rather than the global queue + +--- + +## Alternatives Considered + +### Rebuild a Dedicated HITL Queue + +A purpose-built queue store separate from `gnat/review/` was considered to +avoid coupling agent governance to the report review subsystem. Rejected +because `ReviewService` and `ReviewQueueStore` already implement exactly the +required semantics (item submission, status polling, approval/rejection, +timeout), and duplication would create two review mechanisms that diverge over +time. The bridge pattern costs fewer than 120 lines of code. + +### Email-Only Notification + +Sending an email to a configured address for `high` and `critical` actions was +prototyped. Rejected because email provides no structured approval path: the +reviewer has no UI from which to approve or reject the action back into the +system. Notifications via XSOAR (and future adapters) provide a structured +approval workflow. + +### Synchronous Approval via Long-Poll + +Blocking the agent's calling thread in a long-poll loop until the review is +resolved was considered. Rejected because it ties up a thread for the full +approval window (up to one hour by default) and makes the system unresponsive +to cancellation. The asynchronous poll-or-suspend model is more appropriate +for an embedded agent runtime. + +### Trust-Level Exemption for `trusted_internal` + +A proposal to exempt `trusted_internal` agents from HITL checks for `high` +impact actions was considered. Rejected on security grounds: trust level +reflects the provenance of the agent code, not the risk of the target action. +Even a fully trusted agent should not autonomously trigger a production SOAR +playbook without a human sign-off. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md b/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md new file mode 100644 index 00000000..59450819 --- /dev/null +++ b/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md @@ -0,0 +1,320 @@ +# ADR-0047 — Workspace Trust Boundary Enforcement (Phase 4E) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT workspaces are the primary isolation unit for multi-tenant and +multi-classification deployments. Each workspace holds a set of STIX objects, +an enrichment log, and configuration for the connectors that may interact with +it. + +Prior to this ADR, workspace isolation was **logical only**: the workspace ID +scoped database queries, but there was no enforcement mechanism preventing a +connector from writing into a workspace that it was not supposed to touch. The +following scenarios had no protection: + +1. An `untrusted_external` connector loading community threat feeds writes + enriched indicators into a `trusted_internal` workspace that holds + classified government-sourced intelligence. The commingling contaminates + the provenance chain. + +2. An MSSP deployment with multiple customer tenants assigns each tenant their + own workspace. A connector instance shared across tenants (e.g. a VirusTotal + client configured with the MSSP's API key) reads objects from workspace A and + enriches them into workspace B. + +3. A `semi_trusted` plugin agent (ADR-0045) is granted `write_stix` permission + but should only write to a sandbox workspace, not to the production + workspace. There is no way to express this constraint. + +Connector trust levels are declared as class-level attributes (`TRUST_LEVEL`) +since ADR-0039, and agent trust levels are registered with `AgentGovernor` +since ADR-0045. The missing piece was a mechanism to declare, on the +**workspace** side, which trust levels and connector identities are permitted +to interact with it. + +--- + +## Decision + +Extend the `workspaces` database table and `Workspace` ORM class with two new +fields that declare the workspace's trust boundary, then enforce that boundary +at connector access time. + +### Database Schema Extension + +Alembic migration `0007_add_workspace_trust_boundary.py` adds two columns to +the existing `workspaces` table: + +| Column | Type | Default | Notes | +|---|---|---|---| +| `trust_boundary` | `VARCHAR(50)` | `'semi_trusted'` | Minimum trust level required to access this workspace | +| `allowed_connector_refs` | `TEXT` | `'[]'` | JSON array of permitted connector class names; empty list means all connectors at or above `trust_boundary` are permitted | + +Both columns are nullable at the database level for backward compatibility with +existing rows; application code treats `NULL` as the defaults shown above. + +```sql +-- Migration 0007 (excerpt) +ALTER TABLE workspaces + ADD COLUMN trust_boundary VARCHAR(50) NOT NULL DEFAULT 'semi_trusted'; + +ALTER TABLE workspaces + ADD COLUMN allowed_connector_refs TEXT NOT NULL DEFAULT '[]'; + +CREATE INDEX ix_workspaces_trust_boundary ON workspaces (trust_boundary); +``` + +### `Workspace` ORM Changes + +`WorkspaceModel` (SQLAlchemy) gains the two mapped columns. The `Workspace` +domain class gains corresponding attributes and one new method: + +```python +@dataclass +class Workspace: + # ... existing fields ... + trust_boundary: str = "semi_trusted" + allowed_connector_refs: list[str] = field(default_factory=list) + + def check_connector_trust(self, connector: object) -> None: + """ + Raise PermissionError if `connector` is not permitted to access + this workspace. + + Checks two conditions in order: + 1. The connector's TRUST_LEVEL rank must be >= trust_boundary rank. + 2. If allowed_connector_refs is non-empty, the connector's class name + must appear in the list. + + Parameters + ---------- + connector : object + Any connector instance that has a TRUST_LEVEL class variable. + + Raises + ------ + PermissionError + If the connector does not satisfy the workspace trust boundary. + """ + connector_trust = getattr(type(connector), "TRUST_LEVEL", "untrusted_external") + if _trust_rank(connector_trust) < _trust_rank(self.trust_boundary): + self._log_violation(connector, "trust_level_insufficient") + raise PermissionError( + f"Connector '{type(connector).__name__}' has trust level " + f"'{connector_trust}', but workspace '{self.workspace_id}' " + f"requires '{self.trust_boundary}' or higher." + ) + if self.allowed_connector_refs: + connector_name = type(connector).__name__ + if connector_name not in self.allowed_connector_refs: + self._log_violation(connector, "connector_not_in_allowlist") + raise PermissionError( + f"Connector '{connector_name}' is not in the allowlist " + f"for workspace '{self.workspace_id}'." + ) +``` + +### Trust Rank Ordering + +```python +_TRUST_RANK: dict[str, int] = { + "untrusted_external": 0, + "semi_trusted": 1, + "trusted_internal": 2, +} + +def _trust_rank(level: str) -> int: + return _TRUST_RANK.get(level, 0) +``` + +The ordering is: `trusted_internal` > `semi_trusted` > `untrusted_external`. +A workspace with `trust_boundary = "trusted_internal"` rejects connectors at +`semi_trusted` or `untrusted_external` even if those connectors are otherwise +granted `write_stix` by `AgentGovernor`. + +### Enforcement Points + +`check_connector_trust()` is called in two locations: + +1. **`Workspace._init_store()`** — at workspace initialisation, when a + connector is bound to the workspace for the first time. +2. **`IngestPipeline.run()`** — immediately before the first `upsert_object()` + call, after `ExecutionContext` has been established. + +Both call sites catch `PermissionError`, log the violation to `execution_log` +as a `security_event` row (see ADR-0039), and re-raise. + +### Configuring Workspace Trust Boundaries + +Workspace trust boundaries are set at workspace creation time via the `Workspace` +API or the CLI: + +```python +from gnat.context.workspace import Workspace + +# Create a high-trust workspace that only accepts VirusTotal and CrowdStrike +ws = Workspace.create( + name="classified-intel", + trust_boundary="trusted_internal", + allowed_connector_refs=["VirusTotalClient", "CrowdStrikeClient"], +) + +# Update an existing workspace's trust boundary +ws = Workspace.load("production") +ws.trust_boundary = "semi_trusted" +ws.allowed_connector_refs = [] # any semi_trusted or higher connector is fine +ws.save() +``` + +CLI equivalent: + +```bash +gnat workspace create classified-intel \ + --trust-boundary trusted_internal \ + --allow-connector VirusTotalClient \ + --allow-connector CrowdStrikeClient + +gnat workspace set-trust production --trust-boundary semi_trusted +``` + +### Violation Logging + +Every `PermissionError` raised by `check_connector_trust()` is written to the +`execution_log` table as a `security_event`: + +```python +def _log_violation(self, connector: object, reason: str) -> None: + self._ctx_store.append_event( + context_id=self._active_context_id, + event_type="security_event", + metadata={ + "violation": "workspace_trust_boundary", + "workspace_id": self.workspace_id, + "trust_boundary": self.trust_boundary, + "connector": type(connector).__name__, + "connector_trust": getattr(type(connector), "TRUST_LEVEL", "unknown"), + "allowed_connector_refs": self.allowed_connector_refs, + "reason": reason, + }, + ) +``` + +These rows are queryable alongside all other execution context events, making +boundary violations visible in the same audit trail as agent permission denials +(ADR-0045) and data lineage events (ADR-0038). + +### Default Behaviour (Backward Compatibility) + +Existing workspaces that do not have `trust_boundary` set receive +`'semi_trusted'` from the migration default. This means all `semi_trusted` +and `trusted_internal` connectors continue to work without any configuration +change. `untrusted_external` connectors (community feed readers, OSINT +scrapers) are blocked from existing workspaces unless the boundary is +explicitly lowered to `'untrusted_external'`. + +This is a deliberate, slightly breaking default: if any existing deployment +uses an `untrusted_external` connector to write into a workspace, it will begin +receiving `PermissionError` after the migration. The operator must explicitly +set `trust_boundary = "untrusted_external"` for those workspaces to restore +prior behaviour. This is the correct security posture: the old behaviour was +unintentionally permissive. + +--- + +## Consequences + +### Positive + +- **Trust-aware workspace isolation:** the workspace itself declares what it + trusts, rather than relying solely on the permission matrix in + `AgentGovernor`. This enables a defence-in-depth model where both the action + policy and the target resource enforce trust constraints independently. +- **Zero-trust workspaces are possible:** a workspace with + `trust_boundary = "trusted_internal"` and a non-empty `allowed_connector_refs` + list will reject every connector that is not explicitly named — suitable for + classified or high-value intelligence stores. +- **MSSP tenancy is enforceable:** each customer workspace can be given an + allowlist of their specific connector instances, preventing cross-tenant + write-through. +- **Violations are auditable:** every blocked access is logged as a + `security_event` in `execution_log`, giving operators a clear record of + attempted boundary crossings. +- **Backward-compatible default:** the `'semi_trusted'` default preserves + existing behaviour for the vast majority of deployments. + +### Negative / Trade-offs + +- **Slightly breaking for `untrusted_external` connectors:** deployments that + rely on community feed connectors writing directly to default workspaces will + require a one-time configuration update after the migration. +- **`allowed_connector_refs` is a class name string:** it compares against + `type(connector).__name__`, which means it is case-sensitive and does not + survive connector class renames. A more robust connector identity mechanism + (e.g. a `CONNECTOR_ID` class constant) is deferred. +- **Enforcement is at the GNAT application layer:** database-level row-security + policies (e.g. PostgreSQL RLS) are not implemented. A connector that + bypasses the GNAT application layer and writes directly to the database is + not constrained. + +### Deferred + +- `CONNECTOR_ID` class constant on `BaseClient` to decouple allowlist entries + from class names +- Database-level row security (PostgreSQL RLS) for multi-process deployments + where multiple GNAT workers share a database +- TUI workspace inspector showing trust boundary configuration and recent + violation events +- Per-workspace read boundary (currently `check_connector_trust()` is called + on write paths only; read-path enforcement is deferred) + +--- + +## Alternatives Considered + +### Separate Database Schema Per Tenant + +Each tenant workspace would live in a separate database schema or database +instance, providing hard isolation at the storage layer. Rejected because it +requires database-level provisioning for each workspace, complicates migrations, +and makes cross-workspace queries (e.g. correlation across tenants for MSSP +analytics) impossible without a federation layer. The application-level trust +boundary model achieves the required isolation for the current threat model at +far lower operational cost. + +### TLP-Only Filtering + +Restrict connector write access based on the TLP marking of the STIX objects +rather than the trust level of the connector. Rejected because TLP controls +*dissemination* of intelligence (who may see it), not *provenance* (who may +write it). A `semi_trusted` connector should not be allowed to inject objects +into a workspace designated for `trusted_internal` sources even if the objects +carry TLP:WHITE markings. + +### Policy Engine Allowlist (ADR-0037) + +The existing policy engine (ADR-0037) could be extended to express workspace +trust boundaries as policy rules rather than workspace attributes. Rejected +for this phase because workspace trust is a stable property of the workspace +itself, not a dynamic rule that should be evaluated against arbitrary +conditions. The policy engine is a better home for complex, contextual +decisions (e.g. "allow if the object's confidence score exceeds 80"); workspace +boundary enforcement is simpler and benefits from being collocated with the +workspace model. + +### Connector-Level Workspace Declarations + +Each connector class could carry a list of workspace IDs it is permitted to +access (inverting the relationship — connector declares targets instead of +workspace declaring sources). Rejected because workspace configuration is +the correct authority for workspace-scoped policy. Distributing access +control across 99 connector class definitions would be operationally unwieldy. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0048-ADR-query-budget.md b/docs/explanation/architecture/adrs/0048-ADR-query-budget.md new file mode 100644 index 00000000..9211bb57 --- /dev/null +++ b/docs/explanation/architecture/adrs/0048-ADR-query-budget.md @@ -0,0 +1,360 @@ +# ADR-0048 — Query Budget and Cost Tracking (Phase 4E) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT coordinates calls to up to 99 external connector platforms. Each +connector call may count against a paid API quota, consume compute time, or +contribute to rate-limit thresholds imposed by the upstream provider. + +Prior to this ADR, two mechanisms provided partial protection: + +1. **`AgentGovernor` rate limiting** (ADR-0045) — a sliding-window counter + per agent per time window, expressed in *number of governor-checked agent + actions*. It does not account for the number of HTTP calls each action + generates, which may be many (e.g. a `list_objects()` that pages through + 5 000 results). + +2. **`QueryBudget` on `ExecutionContext`** (ADR-0039) — a `max_connector_calls` + field on the context dataclass. It was designed as a placeholder but had + no enforcement mechanism: `BaseClient._request()` did not check it, and + there was no `BudgetExceeded` exception class. + +The consequence was that an agent or pipeline with unrestricted connector +access could: + +- Page through an entire VirusTotal result set in a single `list_objects()` + call, exhausting the day's API quota for the entire deployment. +- Create a thundering-herd problem where multiple parallel enrichment + pipelines all call the same rate-limited platform simultaneously. +- Provide no cost attribution: there was no record of which connector, agent, + or pipeline consumed the most API calls over a given period. + +These gaps made GNAT unsuitable for deployments with strict API cost controls +or quota-sharing across teams. + +--- + +## Decision + +Extend `QueryBudget` (introduced as a stub in ADR-0039) into a fully +functional cost-tracking and enforcement mechanism, and wire it into the hot +path of `BaseClient._request()`. + +### `QueryBudget` Dataclass (Extended) + +Located in `gnat/core/context.py`, replacing the stub from ADR-0039: + +```python +@dataclass +class QueryBudget: + """Per-execution resource budget for connector API calls. + + Parameters + ---------- + max_units : int + Maximum total cost units for this execution. Each connector call + deducts ``COST_UNIT`` units from the budget. Raise + ``BudgetExceeded`` when the budget is exhausted. + """ + + max_units: int + _consumed: int = field(default=0, repr=False, init=False) + + @property + def remaining(self) -> int: + """Remaining cost units.""" + return self.max_units - self._consumed + + @property + def is_exhausted(self) -> bool: + """True when no budget remains.""" + return self._consumed >= self.max_units + + def consume(self, units: int, connector: str) -> None: + """Deduct *units* from the budget on behalf of *connector*. + + Parameters + ---------- + units : int + Cost units to deduct. Use ``BaseClient.COST_UNIT`` (default 1) + for single-item requests; use larger values for bulk/search ops. + connector : str + Connector class name, used for cost attribution logging. + + Raises + ------ + BudgetExceeded + If deducting *units* would exceed ``max_units``. + """ + if self._consumed + units > self.max_units: + raise BudgetExceeded( + connector=connector, + cost=units, + remaining=self.remaining, + ) + self._consumed += units +``` + +### `BudgetExceeded` Exception + +```python +class BudgetExceeded(GNATClientError): + """Raised when a connector call would exceed the active QueryBudget. + + Attributes + ---------- + connector : str + Name of the connector that attempted the call. + cost : int + Cost units the call would have consumed. + remaining : int + Budget units remaining at the time of the attempt. + """ + + def __init__(self, connector: str, cost: int, remaining: int) -> None: + self.connector = connector + self.cost = cost + self.remaining = remaining + super().__init__( + f"Budget exhausted: connector='{connector}' attempted " + f"cost={cost} but only {remaining} units remain." + ) +``` + +`BudgetExceeded` inherits from `GNATClientError` (from `gnat.clients.base`) +so it is caught by the standard error handling path and propagates through +pipelines identically to any other HTTP-layer failure. + +### `COST_UNIT` Class Variable on `BaseClient` + +```python +class BaseClient: + COST_UNIT: int = 1 # default: 1 unit per HTTP request + TRUST_LEVEL: str = "semi_trusted" + + def _request(self, method: str, path: str, **kwargs) -> urllib3.HTTPResponse: + if self._context and self._context.budget: + self._context.budget.consume( + self.COST_UNIT, + connector=type(self).__name__, + ) + # ... existing HTTP dispatch ... +``` + +Connectors that make bulk or search calls override `COST_UNIT` to reflect +their relative expense: + +| Connector Category | `COST_UNIT` | Rationale | +|---|---|---| +| Standard single-object GET / POST | `1` | Default; one API call, one result | +| Bulk list / paginated results | `10` | One call may return hundreds of objects | +| Full-text search queries | `5` | Search indexes are expensive to query at scale | +| AI inference calls (LLM connectors) | `20` | Token cost is orders of magnitude above REST calls | + +Example for the VirusTotal connector, which supports paginated list endpoints: + +```python +class VirusTotalClient(BaseClient): + COST_UNIT = 1 # single-lookup default + + def list_objects(self, query: str, limit: int = 100) -> list[dict]: + # Bulk paging — charge 10 per page + results = [] + cursor = None + while True: + if self._context and self._context.budget: + self._context.budget.consume(10, connector="VirusTotalClient") + page = self._request("GET", f"/intelligence/search?query={query}&cursor={cursor}") + # ... parse and accumulate ... + if not page.get("meta", {}).get("cursor"): + break + cursor = page["meta"]["cursor"] + return results +``` + +### `ExecutionContext.create()` with Budget + +The `max_budget_units` parameter on `ExecutionContext.create()` is now wired: + +```python +ctx = ExecutionContext.create( + initiated_by="enrichment-pipeline", + domain="analysis", + workspace_id="production", + max_budget_units=500, +) +# ctx.budget is a QueryBudget(max_units=500) + +# With no budget limit: +ctx = ExecutionContext.create( + initiated_by="manual", + domain="ingestion", + workspace_id="sandbox", + # max_budget_units omitted → ctx.budget is None → unlimited +) +``` + +### Cost Logging — `query_cost_log` Table + +Every call to `QueryBudget.consume()` appends a row to the `query_cost_log` +table (Alembic migration `0008_add_query_cost_log.py`): + +| Column | Type | Notes | +|---|---|---| +| `id` | `INTEGER` | Auto-increment primary key | +| `context_id` | `VARCHAR(36)` | FK → `execution_log.id` | +| `connector` | `VARCHAR(200)` | Connector class name | +| `cost_units` | `INTEGER` | Units deducted by this call | +| `cumulative_consumed` | `INTEGER` | Budget state after deduction | +| `budget_max` | `INTEGER` | `max_units` of the owning `QueryBudget` | +| `recorded_at` | `DATETIME` | UTC timestamp | + +```sql +-- Migration 0008 (excerpt) +CREATE TABLE query_cost_log ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + context_id VARCHAR(36) NOT NULL, + connector VARCHAR(200) NOT NULL, + cost_units INTEGER NOT NULL, + cumulative_consumed INTEGER NOT NULL, + budget_max INTEGER NOT NULL, + recorded_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (context_id) REFERENCES execution_log(id) +); + +CREATE INDEX ix_query_cost_log_context ON query_cost_log (context_id); +CREATE INDEX ix_query_cost_log_connector ON query_cost_log (connector, recorded_at); +``` + +Logging is best-effort: a failure to write to `query_cost_log` is caught and +logged at `WARNING` level but does not propagate. The budget deduction itself +always occurs before the log write, so enforcement is never skipped. + +### Querying Cost Attribution + +```python +from gnat.core.context import CostAttributionQuery + +report = CostAttributionQuery(db_session).by_connector( + connector="VirusTotalClient", + since=datetime(2026, 4, 1), +) +# Returns list of (date, connector, total_units, call_count) + +report = CostAttributionQuery(db_session).by_context(context_id="...") +# Returns per-connector breakdown for a single execution +``` + +### Configuration + +```ini +[context] +default_budget_units = 0 ; 0 = unlimited (default for manual runs) +pipeline_budget_units = 1000 ; budget applied to scheduled pipeline runs +agent_budget_units = 200 ; budget applied to each agent session +``` + +When `pipeline_budget_units` is set, `FeedScheduler` automatically creates +an `ExecutionContext` with `max_budget_units=pipeline_budget_units` for every +scheduled feed run. + +--- + +## Consequences + +### Positive + +- **Hard resource limit for pipelines and agents:** a misconfigured + `ResearchAgent` looping over VirusTotal will hit `BudgetExceeded` after + `max_budget_units / COST_UNIT` calls rather than running indefinitely. +- **First-class error with actionable context:** `BudgetExceeded` carries + `connector`, `cost`, and `remaining` — the operator can immediately see + which connector triggered the limit and by how much. +- **Per-connector cost attribution:** `query_cost_log` provides a persistent, + queryable record of which connectors consumed what share of the budget over + any time window. This enables quota planning and chargeback reporting for + MSSP deployments. +- **Zero overhead when no budget is set:** if `ctx.budget` is `None`, the + `if` guard in `_request()` is a single attribute lookup that short-circuits + immediately. Deployments that do not need budget enforcement pay no cost. +- **Bulk and search overrides enable accurate cost modelling:** connectors + that page through large result sets can declare realistic `COST_UNIT` + multipliers rather than counting every paginated request as 1 unit. + +### Negative / Trade-offs + +- **`COST_UNIT` is a class constant, not a per-call value:** a connector + cannot dynamically adjust the cost of a call based on the response size + (e.g. charging more for a response with 10 000 results than one with 10). + Per-call dynamic costing is deferred. +- **Cost logging adds one `INSERT` per connector call when a budget is + active:** high-frequency pipelines may produce large volumes of cost log + rows. A retention or aggregation policy is needed for long-running + deployments. +- **Budget is per-execution-context, not global:** two concurrent pipelines + each with a budget of 1 000 units can together consume 2 000 units from a + platform with a 1 500-unit daily quota. Cross-context global quota + enforcement requires a shared counter (deferred). + +### Deferred + +- Global quota pool shared across concurrent `ExecutionContext` instances + (requires a Redis or database-backed counter) +- Dynamic per-call cost calculation based on response size or token count +- `query_cost_log` retention policy and aggregation rollups +- Cost attribution dashboard in the TUI +- Per-connector quota configuration in `config.ini` (e.g. `[virustotal] + daily_quota = 500`) + +--- + +## Alternatives Considered + +### Connector-Level Rate Limits Only + +Apply rate limits at the connector level rather than introducing a budget +concept on `ExecutionContext`. For example, each connector would track its +own call count and sleep or raise when a per-hour limit is reached. Rejected +because: + +1. Connector-level limits do not aggregate across connectors. A pipeline + that calls five connectors 200 times each has made 1 000 total calls, but + no connector-level limit would fire. +2. Rate limits and budgets serve different purposes: rate limits protect + against *throughput* spikes; budgets protect against *total cost* within + an execution. Both are needed; budget enforcement complements rather than + replaces rate limiting. + +### OS-Level Resource Limits (cgroups / `resource.setrlimit`) + +Applying OS-level CPU or memory limits to pipeline processes was considered +as a coarser alternative. Rejected because it does not provide per-connector +cost attribution, does not integrate with the GNAT audit trail, and does not +map naturally to API quota units (which are a business concept, not an +OS resource). + +### OpenAI / Anthropic Cost Estimators as the Model + +Using the token-count-based cost estimation models from LLM providers as the +primary budget unit was considered. Rejected because GNAT's connectors are +predominantly REST API clients, not LLM callers. A unified unit (abstract +cost units with connector-specific `COST_UNIT` multipliers) is more flexible +and does not require token counting infrastructure for non-LLM connectors. + +### Queue-Based Throttling (Celery / RQ) + +Routing all connector calls through a task queue and configuring per-connector +concurrency limits was prototyped. Rejected because it introduces a mandatory +message broker dependency for a feature that should be available in single- +process deployments. Queue-based throttling remains an option for scale-out +deployments but should not be required for the core use case. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md b/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md new file mode 100644 index 00000000..9928c74d --- /dev/null +++ b/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md @@ -0,0 +1,427 @@ +# ADR-0049 — Simulation-Based Testing Framework (Phase 4E) + +**Date:** 2026-04-09 +**Status:** Accepted +**Deciders:** GNAT Platform Team + +--- + +## Context + +GNAT's unit test suite (`tests/unit/`) exercises connector logic through the +`mock_http_response` and `mock_pool_manager` fixtures defined in +`tests/conftest.py`. These fixtures mock at the HTTP layer (`urllib3.PoolManager`) +and are effective for testing single connector methods in isolation. + +As GNAT's Phase 4 features were added, three gaps in the testing infrastructure +became significant: + +### Gap 1 — No Full-Pipeline Connector Fixture + +The `mock_pool_manager` fixture returns raw HTTP bytes. Tests that need to +exercise a complete pipeline (ingest → enrich → export) must either: + +- Construct a chain of `mock_http_response` objects for every API call the + pipeline makes, which is brittle and tied to internal implementation order, or +- Use a live connector, which requires network access and real credentials. + +There is no fixture connector that implements the full `ConnectorMixin` interface +with predictable, in-memory STIX data — making pipeline-level unit tests +impractical. + +### Gap 2 — No Replay Testing + +`ExecutionContext.is_replay` (ADR-0039) is set by pipeline runners to suppress +side effects during re-runs. But there was no test framework support for +verifying that a pipeline produces idempotent output: given the same +`execution_log` entries from a previous run, a re-run should produce the same +STIX IDs without duplicate write calls. + +### Gap 3 — Agent Tests Require Live Governor and Review Queue + +Tests for `AgentGovernor` (ADR-0045) and `HITLGateway` (ADR-0046) need a +complete governance stack, including a `ReviewService` that auto-approves +actions so the test can proceed without human input. Assembling this stack +from individual fixtures in each test file is repetitive and error-prone. + +--- + +## Decision + +Introduce a **`gnat/testing/`** package with three components that together +make full-pipeline, replay, and agent governance tests practical without +network access or live credentials. + +All three components live in `gnat/testing/simulation.py` and are exported +from `gnat/testing/__init__.py`. + +### Component 1 — `SimulationConnector` + +A `ConnectorMixin`-compatible connector backed entirely by an in-memory list +of STIX fixture objects. No HTTP calls are made. + +```python +from gnat.testing import SimulationConnector +from gnat.orm.indicator import Indicator + +connector = SimulationConnector(trust_level="semi_trusted") + +# Preload fixtures +ioc = Indicator(name="evil.example.com", pattern="[domain-name:value = 'evil.example.com']") +connector.add_fixture(ioc.to_dict()) + +# Standard ConnectorMixin interface works as expected +objects = connector.list_objects() # returns [ioc.to_dict()] +obj = connector.get_object(ioc.id) # returns ioc.to_dict() +connector.upsert_object({"type": "indicator", ...}) # appended to fixture list +connector.delete_object(ioc.id) # removes from fixture list + +# Iterate all fixtures (useful for pipeline testing) +for stix_obj in connector.iter_fixtures(): + print(stix_obj["type"], stix_obj["id"]) +``` + +#### Error-Path Testing + +```python +# Simulate connector failures for error-path tests +connector = SimulationConnector(raise_on_request=True) +# All list_objects() / get_object() calls raise GNATClientError +``` + +#### Budget Integration + +`SimulationConnector` deducts from the active `QueryBudget` on every call, +just as a real connector would. This lets tests verify that a pipeline's +budget arithmetic is correct without making real HTTP calls: + +```python +ctx = ExecutionContext.create( + initiated_by="test", + domain="ingestion", + workspace_id="test-ws", + max_budget_units=5, +) +connector = SimulationConnector() +connector._context = ctx + +# Budget is charged on each call +connector.list_objects() # consumes COST_UNIT (1) +connector.list_objects() # consumes 1 more +# After 5 calls, BudgetExceeded is raised +``` + +#### Full `ConnectorMixin` Interface + +| Method | Behaviour | +|---|---| +| `authenticate()` | No-op; always succeeds | +| `health_check()` | Returns `{"status": "ok"}` | +| `list_objects()` | Returns copy of the fixture list | +| `get_object(stix_id)` | Finds by `id` field; raises `KeyError` if not found | +| `upsert_object(obj)` | Appends if new `id`; replaces if existing `id` | +| `delete_object(stix_id)` | Removes by `id`; no-op if not found | +| `to_stix(obj)` | Identity transform (returns `obj`) | +| `from_stix(stix_obj)` | Identity transform (returns `stix_obj`) | +| `add_fixture(obj)` | Test helper: pre-loads a STIX object | +| `iter_fixtures()` | Test helper: yields all current fixture objects | + +### Component 2 — `ReplayRunner` + +A test helper that verifies pipeline idempotency using the `execution_log`. + +```python +from gnat.testing import ReplayRunner + +def my_pipeline(ctx: ExecutionContext) -> list[dict]: + connector = SimulationConnector() + connector.add_fixture(indicator_dict) + return connector.list_objects() + +runner = ReplayRunner(pipeline_fn=my_pipeline) + +# First run: executes pipeline and records execution_log entries +first_run_ids = runner.run_first(workspace_id="test-ws") + +# Replay: re-executes each log entry with is_replay=True, +# asserts all expected STIX IDs appear in output +runner.replay( + execution_log=runner.last_execution_log, + expected_stix_ids=first_run_ids, +) +# Raises AssertionError if any expected ID is missing from the replay output +``` + +#### `ReplayRunner` Internals + +```python +class ReplayRunner: + def __init__(self, pipeline_fn: Callable[[ExecutionContext], list[dict]]): + self._pipeline_fn = pipeline_fn + self.last_execution_log: list[dict] = [] + + def run_first(self, workspace_id: str = "default") -> list[str]: + ctx = ExecutionContext.create( + initiated_by="test-replay-runner", + domain="ingestion", + workspace_id=workspace_id, + ) + results = self._pipeline_fn(ctx) + self.last_execution_log = ctx._store.query(ctx.context_id) + return [obj["id"] for obj in results if "id" in obj] + + def replay( + self, + execution_log: list[dict], + expected_stix_ids: list[str], + ) -> None: + replay_ctx = ExecutionContext.create( + initiated_by="test-replay-runner", + domain="ingestion", + workspace_id="default", + is_replay=True, + ) + results = self._pipeline_fn(replay_ctx) + result_ids = {obj["id"] for obj in results if "id" in obj} + missing = set(expected_stix_ids) - result_ids + if missing: + raise AssertionError( + f"Replay produced different STIX IDs. Missing: {missing}" + ) +``` + +### Component 3 — `AgentTestHarness` + +A convenience wrapper around `AgentGovernor` and `HITLGateway` that uses a +`_MockReviewService` which auto-approves all submitted review items. + +```python +from gnat.testing import AgentTestHarness +from gnat.agents.governor import AgentActionType + +harness = AgentTestHarness(trust_level="semi_trusted") + +# Run an action through the full governance stack +result = harness.run_action( + agent_id="test-agent", + action_type=AgentActionType.write_stix, + target_ref="indicator--abc123", + impact_level="high", # normally blocked — auto-approved by MockReviewService +) + +assert result["status"] == "approved" +assert result["approved_by"] == "mock-reviewer" + +# Inspect all actions recorded during the test +for action in harness.recorded_actions: + print(action.agent_id, action.action_type, action.status) + +# Assert specific governance outcomes +harness.assert_action_recorded( + action_type=AgentActionType.write_stix, + status="approved", +) +harness.assert_no_permission_denied() +harness.assert_rate_limit_not_exceeded() +``` + +#### `_MockReviewService` + +The mock review service used internally by `AgentTestHarness`: + +```python +class _MockReviewService: + """Auto-approves all submitted review items for use in tests.""" + + def submit(self, item_type, payload, submitter, priority="normal"): + item_id = str(uuid4()) + return ReviewItem( + id=item_id, + item_type=item_type, + payload=payload, + submitter=submitter, + status=ReviewStatus.APPROVED, + submitted_at=datetime.utcnow(), + reviewed_by="mock-reviewer", + reviewed_at=datetime.utcnow(), + ) + + def get(self, review_id: str) -> ReviewItem: + return ReviewItem(status=ReviewStatus.APPROVED, ...) + + def reject(self, review_id: str, reason: str, reviewer: str) -> None: + pass # no-op in mock +``` + +#### Policy Override Support + +`AgentTestHarness` exposes `set_policy_override()` for testing custom +permission configurations: + +```python +harness = AgentTestHarness(trust_level="untrusted_external") + +# Grant a normally-blocked action for this test +harness.set_policy_override( + agent_id="test-agent", + action_type=AgentActionType.export, + allowed=True, +) + +result = harness.run_action( + agent_id="test-agent", + action_type=AgentActionType.export, + target_ref="bundle--xyz", + impact_level="medium", +) +assert result["status"] == "approved" +``` + +### Package Layout + +``` +gnat/testing/ +├── __init__.py # Exports: SimulationConnector, ReplayRunner, AgentTestHarness +└── simulation.py # All three components in one module +``` + +The `gnat/testing/` package is part of the `[dev]` extras group and is not +included in the core install: + +```toml +[project.optional-dependencies] +dev = [ + # ... existing dev deps ... + "gnat[testing]", +] +testing = [] # gnat/testing/ is pure Python; no extra deps required +``` + +### Integration with Existing Fixtures + +`SimulationConnector` is compatible with the existing `mock_pool_manager` +fixture. Tests that need both HTTP-level mocking (for a real connector) and +a simulation connector (for a parallel pipeline branch) can use both in the +same test: + +```python +def test_enrichment_pipeline(mock_pool_manager, minimal_config): + real_connector = VirusTotalClient.from_config(minimal_config) + sim_connector = SimulationConnector(trust_level="trusted_internal") + sim_connector.add_fixture(indicator_dict) + + pipeline = EnrichPipeline( + source=sim_connector, + enricher=real_connector, # HTTP calls intercepted by mock_pool_manager + ) + result = pipeline.run(workspace_id="test") + assert len(result.enriched) == 1 +``` + +--- + +## Consequences + +### Positive + +- **Full pipeline tests without network or credentials:** `SimulationConnector` + implements the complete `ConnectorMixin` interface, so any pipeline that + accepts a connector can be tested end-to-end in a unit test with no network + dependency. +- **Idempotency assertions are built-in:** `ReplayRunner` provides a standard, + reusable way to verify that a pipeline produces the same STIX IDs on first + run and replay — a previously unverifiable property. +- **Agent tests are fully deterministic:** `AgentTestHarness` with + `_MockReviewService` removes the non-determinism introduced by human review + queue state, making governance tests runnable in CI without any external + state. +- **Budget testing at no extra cost:** `SimulationConnector` participates in + `QueryBudget` accounting, so budget arithmetic can be tested without real + HTTP calls. +- **No new runtime dependencies:** `gnat/testing/` is pure Python and + introduces no additional packages. It reuses existing GNAT infrastructure + (`ExecutionContext`, `AgentGovernor`, `HITLGateway`, `ReviewItem`). + +### Negative / Trade-offs + +- **`SimulationConnector` does not validate STIX schema:** objects loaded via + `add_fixture()` are stored and returned as plain dicts without STIX 2.1 + schema validation. Tests that depend on strict STIX conformance must add + their own validation or use the `stix-validate` extra. +- **`ReplayRunner` assumes pure-function pipelines:** pipelines that produce + different STIX IDs for the same input (e.g. because they embed + `datetime.utcnow()` in generated object IDs) will fail the idempotency + assertion. These pipelines must be refactored to accept a deterministic + clock before they can be replay-tested. +- **`_MockReviewService` always approves:** tests that need to verify + rejection-path behaviour must subclass `AgentTestHarness` and supply a + custom review service. + +### Deferred + +- `SimulationConnector` STIX schema validation mode (using `stix2-patterns`) +- `ReplayRunner` diff output: when IDs differ between runs, show which IDs + were added and which were removed rather than a bare set difference +- `AgentTestHarness` rejection-path helper: `set_auto_reject(action_type)` + to configure the mock service to reject specific action types +- Pytest plugin (`conftest.py` auto-injection) to make `SimulationConnector` + and `AgentTestHarness` available as fixtures without explicit import + +--- + +## Alternatives Considered + +### VCR Cassette Recording + +The `vcrpy` library records real HTTP interactions to YAML cassette files and +replays them in subsequent test runs. This was evaluated as an alternative to +`SimulationConnector` for full-pipeline tests. Rejected because: + +1. Connector responses vary considerably across platforms: pagination cursors, + timestamps, and session tokens change between runs, requiring heavy cassette + filtering that is difficult to maintain. +2. Cassettes capture the *HTTP layer*, not the *connector interface*. A change + to a connector's internal request structure (e.g. adding a query parameter) + invalidates the cassette even if the connector's public API is unchanged. +3. Cassettes for 99 connectors would add significant binary content to the + repository. + +`SimulationConnector` operates at the connector interface level, above HTTP, +and requires no cassette maintenance. + +### Docker-Based Integration Tests Only + +Accepting that full-pipeline tests require Docker (as the existing `--run-docker` +integration suite does) was evaluated. Rejected for this use case because: + +1. Docker integration tests are slow (30–120 seconds each) and cannot serve as + unit tests that run on every pull request. +2. They require a running Docker daemon, which is not available in all CI + environments. +3. They test against real connector implementations (Splunk, MISP containers), + not against the GNAT pipeline logic itself. + +Docker integration tests remain the correct tool for verifying connector +authentication and platform compatibility. `gnat/testing/` is the correct +tool for pipeline logic verification. + +### Pytest Fixtures for Each Governance Component + +Rather than `AgentTestHarness`, individual pytest fixtures could be registered +in `tests/conftest.py` for `AgentGovernor`, `HITLGateway`, and +`_MockReviewService`. Rejected because: + +1. Fixtures are test-file-scoped; the harness is reusable outside the test + suite (e.g. in a REPL or notebook for interactive development). +2. Assembling three fixtures in a consistent configuration is error-prone; + `AgentTestHarness` encapsulates the wiring and ensures consistent defaults. +3. Per-component fixtures still require each test to know the correct wiring + order; `AgentTestHarness.run_action()` expresses intent more clearly. + +The existing `conftest.py` fixtures (`mock_http_response`, `mock_pool_manager`, +`minimal_config`, `sak_client`) remain unchanged and continue to cover +HTTP-level mocking. + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/explanation/architecture/adrs/README.md b/docs/explanation/architecture/adrs/README.md index d302ad11..4c3af8c4 100644 --- a/docs/explanation/architecture/adrs/README.md +++ b/docs/explanation/architecture/adrs/README.md @@ -44,6 +44,21 @@ subsystems. 35. [ADR-0035: Quality Agents](0035-ADR-quality-agents.md) 36. [ADR-0036: Security Agents Phase B](0036-ADR-security-agents-phaseb.md) 37. [ADR-0037: Adopt Responsible Disclosure, DCO, and Apache 2.0 Compliance](0037-ADR-adopt-responsible-disclosure-dco-and-apache-2.0-compliance.md) +38. [ADR-0038: Data Lineage Tracking](0038-data-lineage.md) + +### Phase 4 — Control, Reasoning, Safety + +39. [ADR-0039: Unified Execution Context](0039-ADR-execution-context.md) +40. [ADR-0040: Connector Trust Model](0040-ADR-connector-trust-model.md) +41. [ADR-0041: Idempotency and Schema Evolution](0041-ADR-idempotency-schema-evolution.md) +42. [ADR-0042: Hypothesis Engine](0042-ADR-hypothesis-engine.md) +43. [ADR-0043: Negative Evidence Tracking](0043-ADR-negative-evidence.md) +44. [ADR-0044: Reasoning Engine](0044-ADR-reasoning-engine.md) +45. [ADR-0045: Agent Governance](0045-ADR-agent-governance.md) +46. [ADR-0046: HITL Gateway](0046-ADR-hitl-gateway.md) +47. [ADR-0047: Workspace Isolation and Trust Boundaries](0047-ADR-workspace-isolation.md) +48. [ADR-0048: Query Budget and Cost Model](0048-ADR-query-budget.md) +49. [ADR-0049: Testing Framework — Simulation and Replay](0049-ADR-testing-framework.md) --- diff --git a/docs/explanation/architecture/diagrams.md b/docs/explanation/architecture/diagrams.md index fb78694b..dbdc73df 100644 --- a/docs/explanation/architecture/diagrams.md +++ b/docs/explanation/architecture/diagrams.md @@ -25,17 +25,58 @@ GNAT is structured as a layered architecture: |-------|---------|---------------| | User Interfaces | `gnat/cli/`, `gnat/tui/`, `gnat/serve/` | CLI subcommands, Textual TUI, FastAPI REST + TAXII | | GNATClient Façade | `gnat/client.py` | Single entry point for all operations | +| **Control & Safety (Phase 4)** | **`gnat/core/`** | **ExecutionContext, Domain boundaries, QueryBudget, trust enforcement** | | Core Pipelines | `gnat/ingest/`, `gnat/analysis/`, `gnat/agents/`, `gnat/research/` | Ingestion, analysis, AI, and research | +| **Reasoning Layer (Phase 4C)** | **`gnat/reasoning/`** | **HypothesisEngine, ReasoningEngine, evidence scoring** | +| **Agent Governance (Phase 4D)** | **`gnat/agents/governor.py`, `gnat/agents/hitl.py`** | **AgentGovernor, HITLGateway, XSOAR escalation** | | Intelligence Products | `gnat/reporting/`, `gnat/dissemination/` | Report lifecycle, export, webhooks | | Data Layer | `gnat/orm/`, `gnat/context/`, `gnat/search/` | STIX ORM, workspace persistence, Solr search | +| **Custom SDOs (Phase 4C)** | **`gnat/stix/sdos/`** | **STIXHypothesis, NegativeEvidenceRecord** | | Platform Connectors | `gnat/connectors/` (99 platforms) | Bidirectional integration with external platforms | -| HTTP Client Layer | `gnat/clients/`, `gnat/async_client/` | urllib3 (sync) + httpx (async) | +| HTTP Client Layer | `gnat/clients/`, `gnat/async_client/` | urllib3 (sync) + httpx (async) + budget tracking | | Scheduling | `gnat/schedule/` | Cron-based feed scheduling | +| **Testing Framework (Phase 4E)** | **`gnat/testing/`** | **SimulationConnector, ReplayRunner, AgentTestHarness** | → Full narrative: [`docs/architecture.md`](../../architecture.md) --- +## Phase 4 Control Layer + +Phase 4 adds a **control and safety** layer that sits above all pipelines and connectors. +Every GNAT operation is now tagged with an `ExecutionContext` that carries its identity, +trust level, domain, and resource budget. + +```mermaid +flowchart LR + subgraph Control ["gnat/core/ — Control Layer"] + CTX[ExecutionContext\ncontext_id, trust_level\ndomain, workspace_id] + BDG[QueryBudget\nmax_units, consumed] + DOM[Domain Boundary\n@domain_boundary decorator] + end + + subgraph Reasoning ["gnat/reasoning/ — Reasoning Layer"] + HE[HypothesisEngine\npropose → evaluate → close] + RE[ReasoningEngine\nprioritize observables] + end + + subgraph Gov ["gnat/agents/ — Governance"] + AG[AgentGovernor\ncan_act, rate_limit, audit] + HG[HITLGateway\nevaluate impact tier] + end + + CTX --> BDG + CTX --> DOM + CTX --> HE + CTX --> RE + CTX --> AG + AG --> HG +``` + +→ ADRs: [0039](adrs/0039-ADR-execution-context.md) · [0040](adrs/0040-ADR-connector-trust-model.md) · [0041](adrs/0041-ADR-idempotency-schema-evolution.md) · [0042](adrs/0042-ADR-hypothesis-engine.md) · [0043](adrs/0043-ADR-negative-evidence.md) · [0044](adrs/0044-ADR-reasoning-engine.md) · [0045](adrs/0045-ADR-agent-governance.md) · [0046](adrs/0046-ADR-hitl-gateway.md) · [0047](adrs/0047-ADR-workspace-isolation.md) · [0048](adrs/0048-ADR-query-budget.md) · [0049](adrs/0049-ADR-testing-framework.md) + +--- + ## Connector Architecture The diagram below illustrates how the 99 platform connectors plug into GNAT via the diff --git a/docs/explanation/architecture/workflow-diagrams.md b/docs/explanation/architecture/workflow-diagrams.md index afe6cabd..464f7c48 100644 --- a/docs/explanation/architecture/workflow-diagrams.md +++ b/docs/explanation/architecture/workflow-diagrams.md @@ -235,7 +235,209 @@ flowchart LR --- -## Using These Diagrams +## 8. ExecutionContext Propagation (Phase 4A) + +This diagram shows how an `ExecutionContext` is created at pipeline entry and propagated +through all downstream operations, providing end-to-end traceability. + +```mermaid +sequenceDiagram + autonumber + actor Operator + participant Pipeline as IngestPipeline
(gnat/ingest) + participant Ctx as ExecutionContext
(gnat/core/context.py) + participant Log as execution_log
(Postgres) + participant Client as BaseClient
(gnat/clients/base.py) + participant Budget as QueryBudget + + Operator->>Pipeline: run(source, workspace_id) + Pipeline->>Ctx: ExecutionContext.create(initiated_by, domain, workspace_id) + Ctx-->>Pipeline: ctx (context_id=UUID, trust_level, is_replay=False) + Pipeline->>Log: INSERT INTO execution_log (ctx.to_dict()) + Log-->>Pipeline: ack + + Pipeline->>Client: connector._context = ctx + loop Per observable + Client->>Budget: budget.consume(COST_UNIT, connector_name) + alt Budget exhausted + Budget-->>Client: raise BudgetExceeded + else OK + Client-->>Pipeline: HTTP response data + end + end + + Note over Pipeline,Ctx: Child context for sub-operation + Pipeline->>Ctx: ctx.child(initiated_by="enrichment-agent", domain="analysis") + Ctx-->>Pipeline: child_ctx (parent_context_id=ctx.context_id) + Pipeline->>Log: INSERT INTO execution_log (child_ctx.to_dict()) +``` + +--- + +## 9. Hypothesis Engine Lifecycle (Phase 4C) + +The full propose → evaluate → close lifecycle for `STIXHypothesis` objects, showing +how Solr corroboration and trust-weighted evidence feed into confidence updates. + +```mermaid +sequenceDiagram + autonumber + actor Analyst + participant Engine as HypothesisEngine
(gnat/reasoning/hypothesis.py) + participant WS as Workspace
(gnat/context/workspace.py) + participant Solr as SolrSearchIndex
(gnat/search/index.py) + participant H as STIXHypothesis
(x-gnat-hypothesis SDO) + + Analyst->>Engine: propose("APT29 behind Q1 campaign", evidence=["rel--1"], confidence=0.2) + Engine->>H: STIXHypothesis(statement, confidence=0.2, status="pending") + H->>H: add_supporting_evidence("rel--1") + Engine->>WS: _add_object(h.to_dict(), mark_dirty=True) + WS-->>Analyst: STIXHypothesis (id, confidence=0.2, status="pending") + + Analyst->>Engine: evaluate(hypothesis_id) + Engine->>WS: load hypothesis object + Engine->>Solr: search(statement, limit=20) + Solr-->>Engine: [corroborating_stix_ids] + Engine->>Engine: corroboration_boost = min(len(ids) × 0.05, 0.3) + Engine->>Engine: raw = (support_count / total) + corroboration_boost + Engine->>H: update_confidence(clamped_raw) + alt confidence ≥ 0.75 + H->>H: status = "confirmed" + else confidence ≤ 0.15 and refute_count > 0 + H->>H: status = "refuted" + end + Engine->>WS: _add_object(h.to_dict(), mark_dirty=True) + Engine-->>Analyst: STIXHypothesis (updated confidence + status) + + Analyst->>Engine: close(hypothesis_id, verdict="confirmed") + Engine->>H: close("confirmed") + Engine->>WS: _add_object(h.to_dict(), mark_dirty=True) + Engine-->>Analyst: STIXHypothesis (status="confirmed") +``` + +--- + +## 10. ReasoningEngine Observable Scoring (Phase 4C) + +How `ReasoningEngine.prioritize()` scores a set of observables using five weighted signals. + +```mermaid +flowchart TD + A([observable_set, context]) --> B[ReasoningEngine.prioritize] + + B --> C[Gather NegativeEvidenceRecords\nfrom workspace] + C --> D[For each observable...] + + D --> E1[trust_weight\nfrom ExecutionContext.trust_level] + D --> E2[age_factor\n1.0 − 5%×age_days] + D --> E3[neg_penalty\n0.3 × fresh_neg_count] + D --> E4[corroboration_bonus\nSolr hits × 0.05] + + E1 --> F[Composite Score\nscore = trust×0.4 + age×0.3\n+ corroboration×0.3 − neg×0.5] + E2 --> F + E3 --> F + E4 --> F + + F --> G[Clamp to 0.0–1.0] + G --> H[Build explanation dict\nmachine-readable components] + H --> I{store_notes?} + I -- Yes --> J[Write STIX note object\nlinked to observable] + I -- No --> K + + J --> K[Collect results] + K --> L[Sort by score DESC] + L --> M([return list of tuple: observable, score, explanation]) + + style F fill:#4ea8de,color:#fff + style M fill:#2d7a2d,color:#fff +``` + +--- + +## 11. Agent Governance & HITL Flow (Phase 4D) + +How every agent action passes through `AgentGovernor` and `HITLGateway` before execution. + +```mermaid +flowchart TD + A([Agent requests action]) --> B[AgentGovernor.can_act\nagent_id, action_type, trust_level] + + B --> C{Policy override\nexists?} + C -- Yes --> D{Override allows?} + C -- No --> E{Trust-level matrix\nallows?} + + D -- No --> F([raise AgentPermissionDenied]) + D -- Yes --> G + + E -- No --> F + E -- Yes --> G[Rate limit check\nsliding window] + + G --> H{Within limit?} + H -- No --> I([raise RateLimitExceeded]) + H -- Yes --> J[Create AgentAction\nimpact_level assigned] + + J --> K[HITLGateway.evaluate] + + K --> L{impact_level?} + L -- low/medium --> M[Auto-approve\napproved_by = auto-policy] + L -- high --> N[ReviewService.submit\nstatus = PENDING] + L -- critical --> O[ReviewService.submit\n+ XSOARClient notification] + + N --> P{Human reviews...} + O --> P + P -- Approved --> Q[action.status = approved\nExecute action] + P -- Rejected --> R([Action cancelled]) + P -- Timeout --> S[Auto-reject\nreviewer = system-timeout] + + M --> Q + Q --> T[AgentGovernor.record_action\nAudit log + HookBus emit] + T --> U([Action complete]) + + style F fill:#c0392b,color:#fff + style I fill:#c0392b,color:#fff + style R fill:#c0392b,color:#fff + style U fill:#2d7a2d,color:#fff +``` + +--- + +## 12. Workspace Trust Boundary Enforcement (Phase 4E) + +How `check_connector_trust()` enforces isolation boundaries before allowing connector access. + +```mermaid +flowchart TD + A([Connector attempts workspace access]) --> B[workspace.check_connector_trust\nconnector] + + B --> C[Read type connector .TRUST_LEVEL] + C --> D[Read workspace.trust_boundary] + + D --> E{connector_rank ≥\nrequired_rank?} + E -- No --> F([raise PermissionError\nConnector trust too low]) + + E -- Yes --> G{allowed_connector_refs\nnon-empty?} + G -- No --> H[Access granted] + G -- Yes --> I{connector class name\nin allowlist?} + + I -- No --> J([raise PermissionError\nConnector not in allowlist]) + I -- Yes --> H + + H --> K[Proceed with read/write] + + style F fill:#c0392b,color:#fff + style J fill:#c0392b,color:#fff + style H fill:#2d7a2d,color:#fff + + subgraph Trust Rank Order + TR1[trusted_internal = 2] + TR2[semi_trusted = 1] + TR3[untrusted_external = 0] + end +``` + +--- + + All Mermaid diagrams in this file can be: diff --git a/docs/how-to/README.md b/docs/how-to/README.md index facd65c2..15753693 100644 --- a/docs/how-to/README.md +++ b/docs/how-to/README.md @@ -20,6 +20,10 @@ Pick the guide for your goal — no need to read them in order. | [Build Cross-Platform Investigations](build-investigations.md) | Collect and correlate evidence from multiple platforms into a unified evidence graph | | [Create Intelligence Reports](create-intelligence-reports.md) | Author structured intelligence products with a formal lifecycle and STIX 2.1 export | | [Disseminate Intelligence](disseminate-intelligence.md) | Export, webhook notifications, TAXII 2.1 serving, and REST API gateway | +| **Phase 4 — Control, Reasoning, Safety** | | +| [Use the Execution Context](use-execution-context.md) | Create and propagate `ExecutionContext`; enforce domain boundaries and trust levels; track query budgets | +| [Use the Reasoning Engine](use-reasoning-engine.md) | Score and rank observables; propose, evaluate, and close hypotheses; track negative evidence | +| [Agent Governance](agent-governance.md) | Permission checks, rate limiting, HITL review, XSOAR escalation, and agent audit trails | --- diff --git a/docs/how-to/agent-governance.md b/docs/how-to/agent-governance.md new file mode 100644 index 00000000..88163c9b --- /dev/null +++ b/docs/how-to/agent-governance.md @@ -0,0 +1,247 @@ +# How-to: Agent Governance + +GNAT's agent governance layer ensures that every AI agent action is authorised, +rate-limited, audited, and — for high-impact operations — reviewed by a human before +execution. + +--- + +## Prerequisites + +- GNAT installed (`pip install gnat`) +- Optionally: `gnat/review/` configured with a `ReviewQueueStore` for HITL flows +- Optionally: XSOAR connector configured for critical action notifications + +--- + +## Check and Enforce Permissions + +```python +from gnat.agents.governor import AgentGovernor, AgentPermissionDenied +from gnat.policy.models import AgentActionType + +governor = AgentGovernor() + +# Check silently +can_enrich = governor.can_act( + agent_id="research-agent-1", + action_type=AgentActionType.ENRICH, + trust_level="semi_trusted", +) +print(can_enrich) # True + +# Raise on denial +try: + governor.require_can_act( + agent_id="otx-reader", + action_type=AgentActionType.TRIGGER_PLAYBOOK, + trust_level="untrusted_external", + ) +except AgentPermissionDenied as e: + print(e) # "otx-reader (trust='untrusted_external') denied trigger_playbook" +``` + +### Default permission matrix + +| Trust Level | Allowed Actions | +|-------------|----------------| +| `trusted_internal` | All actions (read_stix, write_stix, delete_stix, enrich, ingest, export, trigger_playbook, manage_workspace, escalate, hypothesize) | +| `semi_trusted` | read_stix, write_stix, enrich, ingest, hypothesize, escalate | +| `untrusted_external` | read_stix, enrich, hypothesize | + +--- + +## Apply Per-Agent Overrides + +Override the default matrix at runtime or via config: + +```python +# Allow a specific agent to trigger playbooks despite semi_trusted level +governor.set_policy_override( + "high-fidelity-agent", + AgentActionType.TRIGGER_PLAYBOOK, + allowed=True, +) + +# Deny an agent from deleting STIX objects even if trust would allow it +governor.set_policy_override( + "read-only-agent", + AgentActionType.DELETE_STIX, + allowed=False, +) +``` + +Or via INI (loaded by `AgentGovernor.from_config(cfg)`): + +```ini +[agent_policy] +high-fidelity-agent.trigger_playbook = true +read-only-agent.delete_stix = false +``` + +--- + +## Rate Limiting + +```python +from gnat.agents.governor import AgentGovernor, RateLimitExceeded + +governor = AgentGovernor(max_calls_per_window=50, window_seconds=60) + +for i in range(55): + try: + governor.rate_limit_check("bulk-agent") + except RateLimitExceeded as e: + print(f"Rate limit hit at call {i}: {e}") + break +``` + +--- + +## Record Actions (Audit Trail) + +```python +from gnat.agents.governor import AgentAction, AgentGovernor +from gnat.policy.models import AgentActionType + +governor = AgentGovernor() + +action = AgentAction( + agent_id="threat-hunter-1", + action_type=AgentActionType.ENRICH, + target_ref="indicator--abc123", + impact_level="low", + context_id=ctx.context_id, # link to ExecutionContext +) + +governor.record_action(action) + +# Query audit log +all_actions = governor.get_action_log() +agent_actions = governor.get_action_log("threat-hunter-1") +``` + +--- + +## HITL (Human-in-the-Loop) Gateway + +For high or critical impact actions, submit them for human review before executing: + +```python +from gnat.agents.hitl import HITLGateway +from gnat.agents.governor import AgentAction +from gnat.policy.models import AgentActionType +from gnat.review.service import ReviewService +from gnat.review.store import ReviewQueueStore + +# Wire to existing review queue +store = ReviewQueueStore(db_url="sqlite:///~/.gnat/gnat.db") +store.create_all() +review_service = ReviewService(store=store) + +gateway = HITLGateway( + review_service=review_service, + approval_timeout_seconds=3600, +) + +action = AgentAction( + agent_id="incident-responder", + action_type=AgentActionType.TRIGGER_PLAYBOOK, + target_ref="indicator--malicious-ip", + impact_level="high", +) + +approved, review_item = gateway.evaluate(action) + +if approved: + # low/medium: auto-approved, execute immediately + print("Action auto-approved, executing...") +else: + # high: blocking — wait for human review + print(f"Awaiting approval. Review ID: {review_item.id}") + + # Later, poll for status + from gnat.review.models import ReviewStatus + status = gateway.check_approval_status(review_item.id) + if status == ReviewStatus.APPROVED: + print("Approved by analyst, executing...") + elif status == ReviewStatus.REJECTED: + print("Rejected, action cancelled.") +``` + +### Impact tiers + +| Impact Level | Behaviour | +|-------------|-----------| +| `low` | Auto-approved immediately; logged only | +| `medium` | Auto-approved immediately; logged only | +| `high` | Submitted to ReviewService as PENDING; blocks execution | +| `critical` | PENDING + XSOAR notification fired via `XSOARClient.upsert_object()` | + +--- + +## Add XSOAR Notification for Critical Actions + +```python +from gnat.connectors.xsoar.client import XSOARClient +from gnat.agents.hitl import HITLGateway + +xsoar = XSOARClient(host="https://xsoar.example.com", api_key="...") + +gateway = HITLGateway( + review_service=review_service, + xsoar_client=xsoar, + approval_timeout_seconds=1800, # 30 minutes +) +``` + +--- + +## Use AgentTestHarness in Tests + +The `AgentTestHarness` provides a fully deterministic test environment — all HITL +submissions are auto-approved and all rate limits are effectively unlimited: + +```python +from gnat.testing import AgentTestHarness +from gnat.agents.governor import AgentPermissionDenied +from gnat.policy.models import AgentActionType + +harness = AgentTestHarness() + +# Run an action end-to-end (permission check + rate limit + HITL + audit) +approved, action = harness.run_action( + agent_id="test-agent", + action_type=AgentActionType.ENRICH, + target_ref="indicator--abc", + impact_level="low", + trust_level="semi_trusted", +) + +assert approved is True +assert action.status == "approved" +assert len(harness.recorded_actions) == 1 + +# Test permission denial +try: + harness.run_action( + agent_id="restricted-agent", + action_type=AgentActionType.TRIGGER_PLAYBOOK, + trust_level="untrusted_external", + ) +except AgentPermissionDenied: + print("Correctly denied") +``` + +--- + +## See Also + +- [ADR-0045 — Agent Governance Layer](../explanation/architecture/adrs/0045-ADR-agent-governance.md) +- [ADR-0046 — HITL Gateway](../explanation/architecture/adrs/0046-ADR-hitl-gateway.md) +- [ADR-0049 — Testing Framework](../explanation/architecture/adrs/0049-ADR-testing-framework.md) +- [Reference: Configuration](../reference/configuration.md) — `[agent_policy]` section + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/how-to/use-execution-context.md b/docs/how-to/use-execution-context.md new file mode 100644 index 00000000..fe1ddd72 --- /dev/null +++ b/docs/how-to/use-execution-context.md @@ -0,0 +1,198 @@ +# How-to: Use the Execution Context + +Every GNAT operation — pipeline run, connector call, agent action — is tagged with an +`ExecutionContext` that carries its identity, domain, trust level, workspace boundary, +and optional resource budget. This guide shows how to create, propagate, and query +execution contexts in your code. + +--- + +## Prerequisites + +- GNAT installed (`pip install gnat`) +- `sqlalchemy` installed for DB persistence (`pip install "gnat[persist]"`) +- At least one connector configured in `~/.gnat/config.ini` + +--- + +## Create a Context + +```python +from gnat.core.context import ExecutionContext + +# Minimal context — defaults to semi_trusted, default policy set +ctx = ExecutionContext.create( + initiated_by="manual", + domain="ingestion", + workspace_id="ws-apt28", +) +print(ctx.context_id) # UUID string +print(ctx.trust_level) # "semi_trusted" +print(ctx.is_replay) # False +``` + +### Create from a connector (inherits trust level) + +```python +from gnat.connectors.splunk.client import SplunkClient +from gnat.core.context import ExecutionContext + +splunk = SplunkClient(host="https://splunk.example.com", ...) + +# Reads SplunkClient.TRUST_LEVEL = "trusted_internal" automatically +ctx = ExecutionContext.from_connector( + connector=splunk, + domain="ingestion", + workspace_id="ws-siem", +) +print(ctx.trust_level) # "trusted_internal" +print(ctx.initiated_by) # "SplunkClient" +``` + +### Create with a query budget + +```python +ctx = ExecutionContext.create( + initiated_by="automated-pipeline", + domain="analysis", + workspace_id="ws-enrichment", + max_budget_units=500, # connector calls are counted against this limit +) +print(ctx.budget.remaining) # 500 +``` + +--- + +## Propagate Through a Pipeline + +Attach the context to a connector so budget tracking and logging work automatically: + +```python +from gnat.connectors.virustotal.client import VirusTotalClient +from gnat.clients.base import BudgetExceeded +from gnat.core.context import ExecutionContext + +ctx = ExecutionContext.create( + initiated_by="enrichment-job", + domain="analysis", + workspace_id="ws-threats", + max_budget_units=100, +) + +vt = VirusTotalClient(host="https://www.virustotal.com", api_key="...") +vt._context = ctx # attach context — budget will be deducted per request + +try: + result = vt.get("/api/v3/files/abc123") +except BudgetExceeded as e: + print(f"Budget exhausted: {e.connector} wanted {e.cost} but only {e.remaining} left") +``` + +--- + +## Create Child Contexts + +Sub-operations (e.g. an enrichment agent spawned by an ingestion pipeline) should use +child contexts so the parent→child trace is preserved in `execution_log`: + +```python +parent_ctx = ExecutionContext.create( + initiated_by="ingest-pipeline", + domain="ingestion", + workspace_id="ws-1", +) + +# Child inherits workspace_id, trust_level, policy_set +child_ctx = parent_ctx.child( + initiated_by="enrichment-agent", + domain="analysis", +) + +print(child_ctx.parent_context_id == parent_ctx.context_id) # True +``` + +--- + +## Domain Boundaries + +The `@domain_boundary` decorator enforces that a function is only called from permitted +upstream domains. Violations raise `DomainBoundaryViolation`. + +```python +from gnat.core.domains import Domain, domain_boundary, DomainBoundaryViolation + +@domain_boundary(Domain.REPORTING, allowed_callers=[Domain.INVESTIGATION, Domain.REPORTING]) +def generate_report(workspace, context): + ... + +@domain_boundary(Domain.INGESTION) +def run_ingest(): + # Calling generate_report from ingestion raises DomainBoundaryViolation + try: + generate_report(ws, ctx) + except DomainBoundaryViolation as e: + print(e) # "ingestion cannot call into reporting domain" +``` + +--- + +## Trust Level Enforcement + +Decorate functions that require a minimum trust level to execute: + +```python +from gnat.core.domains import require_trust_level, TrustLevelViolation + +@require_trust_level("trusted_internal") +def trigger_soar_playbook(playbook_id, context): + ... + +# Context with semi_trusted will raise +ctx = ExecutionContext.create(initiated_by="ot", domain="execution", workspace_id="ws") +try: + trigger_soar_playbook("PB-001", context=ctx) +except TrustLevelViolation as e: + print(e) # "requires trusted_internal but active trust is semi_trusted" +``` + +--- + +## Replay Mode + +Set `is_replay=True` to suppress SOAR triggers and side-effects during replay runs: + +```python +ctx = ExecutionContext.create( + initiated_by="replay-runner", + domain="ingestion", + workspace_id="ws-replay", + is_replay=True, +) + +# Pipelines check ctx.is_replay before firing SOAR actions +if not ctx.is_replay: + xsoar_client.trigger_playbook(...) +``` + +--- + +## Serialise / Deserialise + +```python +d = ctx.to_dict() +# Store d in DB, pass over API boundary, etc. +ctx2 = ExecutionContext.from_dict(d) +``` + +--- + +## See Also + +- [ADR-0039 — Unified Execution Context](../explanation/architecture/adrs/0039-ADR-execution-context.md) +- [ADR-0040 — Connector Trust Model](../explanation/architecture/adrs/0040-ADR-connector-trust-model.md) +- [ADR-0048 — Query Budget](../explanation/architecture/adrs/0048-ADR-query-budget.md) +- [Reference: Configuration](../reference/configuration.md) + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/how-to/use-reasoning-engine.md b/docs/how-to/use-reasoning-engine.md new file mode 100644 index 00000000..b1a5525d --- /dev/null +++ b/docs/how-to/use-reasoning-engine.md @@ -0,0 +1,218 @@ +# How-to: Use the Reasoning Engine + +GNAT's reasoning layer lets you score and rank STIX observables by evidence quality, +track analyst hypotheses with structured evidence links, and suppress redundant connector +queries using negative evidence records. + +--- + +## Prerequisites + +- GNAT installed (`pip install gnat`) +- A `WorkspaceManager` configured (see [How-to: Use Workspaces](use-workspaces.md)) +- Optionally: Solr search sidecar running (see `[search]` config section) + +--- + +## Score Observables with ReasoningEngine + +`ReasoningEngine.prioritize()` assigns a composite score in `[0.0, 1.0]` to each +observable based on: + +| Signal | Weight | Description | +|--------|--------|-------------| +| Connector trust weight | 40% | `trusted_internal`→0.9, `semi_trusted`→0.6, `untrusted_external`→0.3 | +| Object age factor | 30% | 1.0 decaying by 5% per day from `modified` timestamp | +| Cross-connector corroboration | 30% | Solr hit count × 0.05, capped at 0.25 | +| Negative evidence penalty | −50% | min(0.3 × fresh NegativeEvidenceRecord count, 0.6) | + +```python +from gnat.reasoning.engine import ReasoningEngine +from gnat.core.context import ExecutionContext +from gnat.context.workspace import WorkspaceManager + +manager = WorkspaceManager.default() + +# Create a context from your connector (sets trust_level automatically) +from gnat.connectors.crowdstrike.client import CrowdStrikeClient +cs = CrowdStrikeClient(host="...", client_id="...", client_secret="...") +ctx = ExecutionContext.from_connector(cs, domain="analysis", workspace_id="my-ws") + +engine = ReasoningEngine(manager=manager, workspace_name="my-ws") + +# Load observables from the workspace +ws = manager.open("my-ws") +observables = list(ws.objects.values()) + +results = engine.prioritize(observables, context=ctx, store_notes=True) + +for observable, score, explanation in results: + print(f"{score:.2f} {observable.id}") + print(f" {explanation['summary']}") +``` + +### Read the structured explanation + +The `explanation` dict is machine-readable: + +```python +_, score, explanation = results[0] + +print(explanation["observable_id"]) # STIX ID +print(explanation["score"]) # 0.0 – 1.0 + +trust_info = explanation["components"]["trust_weight"] +print(trust_info["trust_level"]) # "semi_trusted" +print(trust_info["weight"]) # 0.6 + +age = explanation["components"]["age_factor"] +print(f"age factor: {age:.2f}") # 0.85 (17 days old at 5%/day decay) + +neg = explanation["components"]["negative_evidence"] +print(f"{neg['count']} fresh neg records, penalty={neg['penalty']:.2f}") + +corr = explanation["components"]["corroboration"] +print(f"{corr['hits']} Solr hits, bonus={corr['bonus']:.2f}") +``` + +### Stored STIX notes + +When `store_notes=True` (default), the engine writes a STIX `note` object to the +workspace for each scored observable. The note contains the full JSON explanation so +analysts can review it later. + +--- + +## Propose and Evaluate Hypotheses + +```python +from gnat.reasoning.hypothesis import HypothesisEngine +from gnat.context.workspace import WorkspaceManager + +manager = WorkspaceManager.default() +engine = HypothesisEngine(manager=manager, workspace_name="apt29-investigation") + +# 1. Propose a hypothesis +h = engine.propose( + statement="192.0.2.1 is a Lazarus Group C2 server.", + initial_evidence=["relationship--abc123"], # STIX relationship IDs + confidence=0.2, # low initial confidence +) +print(h._properties["status"]) # "pending" +print(h._properties["confidence"]) # 0.2 + +# 2. Evaluate — queries Solr for corroborating evidence +h = engine.evaluate(h.id) +print(h._properties["confidence"]) # updated based on evidence + Solr hits +print(h._properties["status"]) # "pending" | "confirmed" | "refuted" + +# 3. Add more evidence manually +h.add_supporting_evidence("relationship--def456") +h.add_refuting_evidence("relationship--ghi789") + +# 4. Close with a verdict +h = engine.close(h.id, verdict="confirmed") +print(h._properties["status"]) # "confirmed" +``` + +### List all hypotheses + +```python +all_hypotheses = engine.list_all() +for h in all_hypotheses: + print(h._properties["statement"][:60], "→", h._properties["status"]) +``` + +--- + +## Track Negative Evidence + +`NegativeEvidenceRecord` suppresses redundant connector re-queries within a configurable TTL. + +```python +from gnat.stix.sdos.negative_evidence import NegativeEvidenceRecord +from gnat.context.workspace import WorkspaceManager + +manager = WorkspaceManager.default() +ws = manager.open("my-ws") + +indicator_id = "indicator--abc123" + +# Check for a fresh negative record before querying +neg_records = [ + obj for obj in ws.objects.values() + if getattr(obj, "stix_type", "") == "x-gnat-negative-evidence" + and obj._properties.get("target_ref") == indicator_id + and not obj.is_expired() # within TTL +] + +if neg_records: + print("Skipping re-query — connector returned no results within TTL") +else: + # Query connector + result = vt_client.get(f"/api/v3/files/{indicator_id}") + + if not result: + # Write negative evidence record + rec = NegativeEvidenceRecord( + target_ref=indicator_id, + queried_connector="VirusTotalClient", + ttl_seconds=3600, # suppress re-queries for 1 hour + ) + ws._add_object(rec.to_dict(), mark_dirty=True) +``` + +Check TTL status: + +```python +rec = NegativeEvidenceRecord(target_ref="indicator--abc", queried_connector="VT", ttl_seconds=3600) +print(rec.is_expired()) # False immediately after creation +print(rec.seconds_remaining()) # ~3600 +``` + +--- + +## Attach Solr for Corroboration + +When Solr is running, the reasoning engine uses it for cross-connector corroboration. +Configure via `[search]` in `~/.gnat/config.ini`: + +```ini +[search] +solr_url = http://localhost:8983/solr/gnat +enabled = true +batch_size = 100 +``` + +Then pass it explicitly: + +```python +from gnat.search.index import SolrSearchIndex, SolrSearchConfig +from gnat.reasoning.engine import ReasoningEngine + +config = SolrSearchConfig(solr_url="http://localhost:8983/solr/gnat") +index = SolrSearchIndex(config) + +engine = ReasoningEngine( + manager=manager, + workspace_name="my-ws", + search_index=index, +) +``` + +Without Solr, the engine falls back to `NullSearchIndex` — all scores work but +the corroboration bonus is always 0.0. + +--- + +## See Also + +- [ADR-0042 — Hypothesis Engine](../explanation/architecture/adrs/0042-ADR-hypothesis-engine.md) +- [ADR-0043 — Negative Evidence](../explanation/architecture/adrs/0043-ADR-negative-evidence.md) +- [ADR-0044 — Reasoning Engine](../explanation/architecture/adrs/0044-ADR-reasoning-engine.md) +- [How-to: Use Workspaces](use-workspaces.md) +- [How-to: Build Investigations](build-investigations.md) + +--- + +*Licensed under the Apache License, Version 2.0* diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index 360c67c3..ccded2a1 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md @@ -133,6 +133,97 @@ default_tlp = amber auto_approve = false ``` +### `[agent_policy]` + +Controls the `AgentGovernor` permission matrix and rate limits (Phase 4D). + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| `max_calls_per_window` | int | `100` | Maximum connector calls an agent may make within `window_seconds` | +| `window_seconds` | int | `60` | Sliding-window size for rate limiting | +| `approval_timeout_seconds` | int | `3600` | Seconds before a pending HITL review is auto-rejected | +| `default_impact_level` | str | `"low"` | Assumed impact level for actions that don't specify one (`low`/`medium`/`high`/`critical`) | + +```ini +[agent_policy] +max_calls_per_window = 100 +window_seconds = 60 +approval_timeout_seconds = 3600 +default_impact_level = low +``` + +Per-agent permission overrides use the pattern `{agent_id}.{action_type}`: + +```ini +[agent_policy] +; Allow research-agent-1 to trigger SOAR playbooks despite semi_trusted level +research-agent-1.trigger_playbook = true + +; Deny threat-hunter-2 from deleting STIX objects even if trust level permits +threat-hunter-2.delete_stix = false +``` + +### `[connector_limits]` + +Per-connector rate limits and cost overrides (Phase 4E). + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| `{connector}.cost_unit` | int | per-class `COST_UNIT` | Override the cost-per-request for a named connector | +| `{connector}.max_calls_per_minute` | int | unlimited | Hard ceiling on calls per minute for a specific connector | + +```ini +[connector_limits] +; VirusTotal has strict rate limits on the free tier +virustotal.cost_unit = 5 +virustotal.max_calls_per_minute = 4 + +; Splunk bulk exports are expensive +splunk.cost_unit = 10 + +; RecordedFuture lookups count as standard +recordedfuture.cost_unit = 1 +``` + +### `[workspace_defaults]` + +Default isolation settings applied to newly created workspaces (Phase 4E). + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| `trust_boundary` | str | `"semi_trusted"` | Minimum connector `TRUST_LEVEL` required for workspace access | +| `allowed_connector_refs` | str | `""` (all) | Comma-separated connector class names that may access this workspace; empty = no restriction | + +```ini +[workspace_defaults] +trust_boundary = semi_trusted +; Leave allowed_connector_refs empty to permit all connectors that meet trust_boundary +allowed_connector_refs = +``` + +To lock a workspace to only internal connectors: + +```ini +[workspace_defaults] +trust_boundary = trusted_internal +allowed_connector_refs = SplunkClient, SentinelClient, ElasticClient +``` + +### `[execution_context]` + +Controls default `ExecutionContext` parameters (Phase 4A). + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| `default_policy_set` | str | `"default"` | Policy set name written to every `execution_log` row | +| `default_budget_units` | int | `0` (unlimited) | Max query budget units per context; 0 = no budget enforced | + +```ini +[execution_context] +default_policy_set = default +default_budget_units = 0 +``` + ### Platform sections Each platform connector reads its own INI section. @@ -144,6 +235,9 @@ See `config/config.ini.example` for the full list of connector keys. - [How-to: Connect to Platforms](../how-to/connect-to-platforms.md) - [How-to: Use the Analysis Layer](../how-to/use-analysis-layer.md) +- [How-to: Use Execution Context](../how-to/use-execution-context.md) +- [How-to: Use the Reasoning Engine](../how-to/use-reasoning-engine.md) +- [How-to: Agent Governance](../how-to/agent-governance.md) - [How-to: Create Intelligence Reports](../how-to/create-intelligence-reports.md) - `config/config.ini.example` diff --git a/docs/sphinx-html/source/agents_governance.rst b/docs/sphinx-html/source/agents_governance.rst new file mode 100644 index 00000000..5bad0035 --- /dev/null +++ b/docs/sphinx-html/source/agents_governance.rst @@ -0,0 +1,160 @@ +Agent Governance +================ + +Phase 4D introduces a governance layer that controls, audits, and rate-limits every +AI agent action. High-impact actions require human approval before execution. + +.. contents:: On this page + :local: + :depth: 2 + +Overview +-------- + +The governance layer has two components: + +* :class:`~gnat.agents.governor.AgentGovernor` — checks permissions against a + trust-level matrix, enforces per-agent rate limits, and maintains an audit log of + all agent actions. +* :class:`~gnat.agents.hitl.HITLGateway` — bridges ``AgentGovernor`` to the existing + :class:`~gnat.review.service.ReviewService`; low/medium-impact actions are + auto-approved, high-impact actions block until a human reviewer approves, and + critical actions also trigger XSOAR notifications. + +Quick Start +----------- + +.. code-block:: python + + from gnat.agents.governor import AgentGovernor, AgentAction + from gnat.agents.hitl import HITLGateway + from gnat.policy.models import AgentActionType + from gnat.review.service import ReviewService + from gnat.review.store import ReviewQueueStore + + # Set up + governor = AgentGovernor(max_calls_per_window=100, window_seconds=60) + store = ReviewQueueStore(db_url="sqlite:///~/.gnat/gnat.db") + store.create_all() + gateway = HITLGateway(review_service=ReviewService(store=store)) + + # Check permission + if governor.can_act("agent-1", AgentActionType.ENRICH, "semi_trusted"): + governor.rate_limit_check("agent-1") + + action = AgentAction( + agent_id="agent-1", + action_type=AgentActionType.ENRICH, + target_ref="indicator--abc", + impact_level="low", + ) + approved, review_item = gateway.evaluate(action) + governor.record_action(action) + +Permission Matrix +----------------- + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Trust Level + - Permitted Actions + * - ``trusted_internal`` + - All actions (read_stix, write_stix, delete_stix, enrich, ingest, export, + trigger_playbook, manage_workspace, escalate, hypothesize) + * - ``semi_trusted`` + - read_stix, write_stix, enrich, ingest, hypothesize, escalate + * - ``untrusted_external`` + - read_stix, enrich, hypothesize + +Impact Tiers +------------ + +.. list-table:: + :header-rows: 1 + :widths: 15 85 + + * - Level + - Behaviour + * - ``low`` + - Auto-approved, logged only + * - ``medium`` + - Auto-approved, logged only + * - ``high`` + - Submitted to ``ReviewService`` as PENDING; blocks until approved/rejected/timed-out + * - ``critical`` + - PENDING + XSOAR notification via ``XSOARClient.upsert_object()`` + +API Reference +------------- + +AgentGovernor +~~~~~~~~~~~~~ + +.. autoclass:: gnat.agents.governor.AgentGovernor + :members: + :undoc-members: + :show-inheritance: + +AgentAction +~~~~~~~~~~~ + +.. autoclass:: gnat.agents.governor.AgentAction + :members: + :undoc-members: + :show-inheritance: + +HITLGateway +~~~~~~~~~~~ + +.. autoclass:: gnat.agents.hitl.HITLGateway + :members: + :undoc-members: + :show-inheritance: + +AgentActionType +~~~~~~~~~~~~~~~ + +.. autoclass:: gnat.policy.models.AgentActionType + :members: + :undoc-members: + :show-inheritance: + +Exceptions +~~~~~~~~~~ + +.. autoclass:: gnat.agents.governor.AgentPermissionDenied + :show-inheritance: + +.. autoclass:: gnat.agents.governor.RateLimitExceeded + :show-inheritance: + +Testing +------- + +Use :class:`~gnat.testing.simulation.AgentTestHarness` for deterministic agent tests: + +.. code-block:: python + + from gnat.testing import AgentTestHarness + from gnat.policy.models import AgentActionType + + harness = AgentTestHarness() + approved, action = harness.run_action( + agent_id="test-agent", + action_type=AgentActionType.ENRICH, + impact_level="low", + trust_level="semi_trusted", + ) + assert approved is True + assert len(harness.recorded_actions) == 1 + +See Also +-------- + +* :doc:`/api/core` — ExecutionContext +* :doc:`/reasoning` — Hypothesis and reasoning engine +* ADR-0045: Agent Governance +* ADR-0046: HITL Gateway +* ADR-0049: Testing Framework diff --git a/docs/sphinx-html/source/api/agents_governance.rst b/docs/sphinx-html/source/api/agents_governance.rst new file mode 100644 index 00000000..9d2a2516 --- /dev/null +++ b/docs/sphinx-html/source/api/agents_governance.rst @@ -0,0 +1,43 @@ +gnat.agents — Governance & HITL +================================ + +.. automodule:: gnat.agents.governor + :members: + :undoc-members: + +.. automodule:: gnat.agents.hitl + :members: + :undoc-members: + +gnat.policy — Permission Models +-------------------------------- + +.. automodule:: gnat.policy + :members: + :undoc-members: + +.. autoclass:: gnat.policy.models.AgentActionType + :members: + :undoc-members: + +.. autofunction:: gnat.policy.models.agent_can_act + +gnat.testing — Simulation Framework +------------------------------------- + +.. automodule:: gnat.testing + :members: + :undoc-members: + +.. autoclass:: gnat.testing.simulation.SimulationConnector + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: gnat.testing.simulation.ReplayRunner + :members: + :undoc-members: + +.. autoclass:: gnat.testing.simulation.AgentTestHarness + :members: + :undoc-members: diff --git a/docs/sphinx-html/source/api/core.rst b/docs/sphinx-html/source/api/core.rst new file mode 100644 index 00000000..88d258c5 --- /dev/null +++ b/docs/sphinx-html/source/api/core.rst @@ -0,0 +1,46 @@ +gnat.core — Execution Context & Domain Boundaries +=================================================== + +Phase 4A cross-cutting infrastructure: execution tracing, domain boundary +enforcement, connector trust, and query budget management. + +.. automodule:: gnat.core + :members: + :undoc-members: + +ExecutionContext +--------------- + +.. autoclass:: gnat.core.context.ExecutionContext + :members: + :undoc-members: + :show-inheritance: + +QueryBudget +----------- + +.. autoclass:: gnat.core.context.QueryBudget + :members: + :undoc-members: + :show-inheritance: + +Domain Boundary Enforcement +--------------------------- + +.. automodule:: gnat.core.domains + :members: + :undoc-members: + +.. autoclass:: gnat.core.domains.Domain + :members: + :undoc-members: + +.. autoclass:: gnat.core.domains.DomainBoundaryViolation + :show-inheritance: + +.. autoclass:: gnat.core.domains.TrustLevelViolation + :show-inheritance: + +.. autofunction:: gnat.core.domains.domain_boundary + +.. autofunction:: gnat.core.domains.require_trust_level diff --git a/docs/sphinx-html/source/api/reasoning.rst b/docs/sphinx-html/source/api/reasoning.rst new file mode 100644 index 00000000..ba1356df --- /dev/null +++ b/docs/sphinx-html/source/api/reasoning.rst @@ -0,0 +1,35 @@ +gnat.reasoning — Hypothesis & Reasoning Engine +=============================================== + +.. automodule:: gnat.reasoning + :members: + :undoc-members: + +ReasoningEngine +--------------- + +.. autoclass:: gnat.reasoning.engine.ReasoningEngine + :members: + :undoc-members: + :show-inheritance: + +HypothesisEngine +---------------- + +.. autoclass:: gnat.reasoning.hypothesis.HypothesisEngine + :members: + :undoc-members: + :show-inheritance: + +Custom STIX SDOs +---------------- + +.. autoclass:: gnat.stix.sdos.hypothesis.STIXHypothesis + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/sphinx-html/source/index.rst b/docs/sphinx-html/source/index.rst index 55df08fc..208e6abd 100644 --- a/docs/sphinx-html/source/index.rst +++ b/docs/sphinx-html/source/index.rst @@ -23,6 +23,8 @@ and STIX 2.1-compatible ORM for security platforms. cli codegen contexts + reasoning + agents_governance .. toctree:: :maxdepth: 3 @@ -36,6 +38,9 @@ and STIX 2.1-compatible ORM for security platforms. api/connectors api/cli api/codegen + api/core + api/reasoning + api/agents_governance api/utils .. toctree:: diff --git a/docs/sphinx-html/source/reasoning.rst b/docs/sphinx-html/source/reasoning.rst new file mode 100644 index 00000000..1aa27543 --- /dev/null +++ b/docs/sphinx-html/source/reasoning.rst @@ -0,0 +1,117 @@ +Reasoning Layer +=============== + +Phase 4C introduces a structured reasoning layer for observable prioritisation and +hypothesis lifecycle management. + +.. contents:: On this page + :local: + :depth: 2 + +Overview +-------- + +The reasoning layer consists of three interconnected components: + +* :class:`~gnat.reasoning.engine.ReasoningEngine` — scores and ranks STIX observables + using a composite of connector trust, object age, Solr corroboration, and negative + evidence signals. +* :class:`~gnat.reasoning.hypothesis.HypothesisEngine` — manages the + ``propose → evaluate → close`` lifecycle for analyst hypotheses stored as custom + STIX SDOs. +* :class:`~gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord` — suppresses + redundant connector re-queries within a configurable TTL window. + +Quick Start +----------- + +.. code-block:: python + + from gnat.reasoning.engine import ReasoningEngine + from gnat.reasoning.hypothesis import HypothesisEngine + from gnat.core.context import ExecutionContext + from gnat.context.workspace import WorkspaceManager + + manager = WorkspaceManager.default() + ctx = ExecutionContext.create( + initiated_by="analyst", + domain="analysis", + workspace_id="my-ws", + ) + + # Score observables + engine = ReasoningEngine(manager=manager, workspace_name="my-ws") + ws = manager.open("my-ws") + results = engine.prioritize(list(ws.objects.values()), context=ctx) + for obs, score, explanation in results: + print(f"{score:.2f} {explanation['summary']}") + + # Propose hypothesis + h_engine = HypothesisEngine(manager=manager, workspace_name="my-ws") + h = h_engine.propose("APT29 behind Q1 campaign", confidence=0.2) + h = h_engine.evaluate(h.id) + h = h_engine.close(h.id, verdict="confirmed") + +API Reference +------------- + +ReasoningEngine +~~~~~~~~~~~~~~~ + +.. autoclass:: gnat.reasoning.engine.ReasoningEngine + :members: + :undoc-members: + :show-inheritance: + +HypothesisEngine +~~~~~~~~~~~~~~~~ + +.. autoclass:: gnat.reasoning.hypothesis.HypothesisEngine + :members: + :undoc-members: + :show-inheritance: + +STIXHypothesis SDO +~~~~~~~~~~~~~~~~~~ + +.. autoclass:: gnat.stix.sdos.hypothesis.STIXHypothesis + :members: + :undoc-members: + :show-inheritance: + +NegativeEvidenceRecord SDO +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord + :members: + :undoc-members: + :show-inheritance: + +Scoring Formula +--------------- + +The composite score is computed as: + +.. code-block:: text + + score = trust_weight × 0.4 + + age_factor × 0.3 + + corroboration × 0.3 + - neg_penalty × 0.5 + + clamped to [0.0, 1.0] + +Where: + +* **trust_weight** — ``trusted_internal``→0.9, ``semi_trusted``→0.6, ``untrusted_external``→0.3 +* **age_factor** — 1.0 decaying by 5% per day from ``modified`` timestamp (floor 0.0) +* **corroboration** — Solr hit count × 0.05, capped at 0.25 +* **neg_penalty** — min(0.3 × fresh NegativeEvidenceRecord count, 0.6) + +See Also +-------- + +* :doc:`/api/core` — ExecutionContext and QueryBudget +* ADR-0042: Hypothesis Engine +* ADR-0043: Negative Evidence +* ADR-0044: Reasoning Engine