diff --git a/docs/explanation/architecture/adrs/0039-ADR-execution-context.md b/docs/explanation/architecture/adrs/0039-ADR-execution-context.md
new file mode 100644
index 00000000..3ab76d6d
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0039-ADR-execution-context.md
@@ -0,0 +1,255 @@
+# ADR-0039 — Unified Execution Context
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT orchestrates a heterogeneous set of operations: ingestion pipeline runs,
+connector enrichment calls, AI agent actions, export jobs, and report
+publishing.  Each of these operations executes independently and, prior to
+this ADR, had no mechanism to:
+
+1. Establish **who** initiated the operation (a named connector, an agent
+   identifier, or a human operator via the CLI).
+2. Declare **which domain** the operation belongs to (`ingestion`, `analysis`,
+   `investigation`, `reporting`, `execution`).
+3. Carry a **trust level** that flows from the originating data source into
+   downstream scoring and policy decisions.
+4. Enforce **workspace isolation** — preventing an ingestion job from one
+   tenant from accidentally writing objects into another tenant's workspace.
+5. Record a **replay flag** so that a re-run of a crashed pipeline can suppress
+   side effects (SOAR triggers, webhook emissions, duplicate enrichment calls).
+6. Impose a **query budget** to prevent runaway agent loops from exhausting
+   API quota or compute time.
+
+Without a unifying carrier object, each component invented its own partial
+solution: pipeline runners passed `workspace_id` as a bare string; the
+enrichment dispatcher read `TRUST_LEVEL` from the connector class but did not
+propagate it; agents tracked their own call counters in local state; replay
+detection was entirely absent.
+
+The result was a system that was difficult to trace, impossible to replay
+safely, and unable to enforce trust-aware prioritisation consistently.
+
+---
+
+## Decision
+
+Introduce `ExecutionContext` — a lightweight, immutable dataclass that every
+pipeline entry point creates at startup and passes through the call chain.
+
+### Location
+
+`gnat/core/context.py`
+
+### Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `context_id` | `UUID` | Unique identifier for this execution; used as correlation ID in logs and the `execution_log` table |
+| `initiated_by` | `str` | Connector name, agent ID, or `"manual"` (CLI/TUI) |
+| `domain` | `str` | One of `ingestion`, `analysis`, `investigation`, `reporting`, `execution` |
+| `trust_level` | `str` | `trusted_internal`, `semi_trusted`, or `untrusted_external` |
+| `policy_set` | `str \| None` | Named policy set applied to this context; `None` uses the default |
+| `workspace_id` | `str` | Workspace isolation boundary; all writes are scoped to this ID |
+| `created_at` | `datetime` | UTC timestamp at construction time |
+| `parent_context_id` | `UUID \| None` | ID of the parent context when this is a child span |
+| `is_replay` | `bool` | `True` suppresses SOAR triggers and idempotent write skip logging |
+| `budget` | `QueryBudget \| None` | Optional call budget; `None` means unlimited |
+
+`QueryBudget` is a small companion dataclass:
+
+```python
+@dataclass
+class QueryBudget:
+    max_connector_calls: int = 50
+    max_agent_tokens: int = 100_000
+    _connector_calls: int = field(default=0, repr=False)
+    _agent_tokens: int = field(default=0, repr=False)
+
+    def charge_connector(self, n: int = 1) -> None:
+        self._connector_calls += n
+        if self._connector_calls > self.max_connector_calls:
+            raise BudgetExceededError("connector call budget exhausted")
+
+    def charge_tokens(self, n: int) -> None:
+        self._agent_tokens += n
+        if self._agent_tokens > self.max_agent_tokens:
+            raise BudgetExceededError("agent token budget exhausted")
+```
+
+### Factory Methods
+
+**`ExecutionContext.create()`** — default factory for manual / CLI invocations:
+
+```python
+ctx = ExecutionContext.create(
+    initiated_by="manual",
+    domain="ingestion",
+    workspace_id="default",
+)
+```
+
+**`ExecutionContext.from_connector(connector)`** — reads `TRUST_LEVEL` from
+the connector class variable and sets `initiated_by` to the connector's module
+name:
+
+```python
+ctx = ExecutionContext.from_connector(
+    connector=crowdstrike_client,
+    domain="ingestion",
+    workspace_id=workspace_id,
+)
+# ctx.trust_level == "semi_trusted"
+# ctx.initiated_by == "crowdstrike"
+```
+
+**`ExecutionContext.child()`** — derives a child context that inherits
+`workspace_id`, `trust_level`, and `budget` from the parent but receives a
+new `context_id` and `parent_context_id`:
+
+```python
+child_ctx = ctx.child(domain="analysis", initiated_by="reasoning_engine")
+assert child_ctx.workspace_id == ctx.workspace_id
+assert child_ctx.parent_context_id == ctx.context_id
+assert child_ctx.context_id != ctx.context_id
+```
+
+### Persistence
+
+Every context is persisted to the `execution_log` table (introduced in Alembic
+migration `0004_add_execution_log.py`):
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | `UUID` | Primary key; maps to `context_id` |
+| `initiated_by` | `VARCHAR(255)` | |
+| `domain` | `VARCHAR(64)` | |
+| `trust_level` | `VARCHAR(64)` | |
+| `workspace_id` | `VARCHAR(255)` | Indexed |
+| `parent_context_id` | `UUID` | Nullable; foreign key to same table |
+| `is_replay` | `BOOLEAN` | |
+| `created_at` | `TIMESTAMP` | UTC |
+| `event_type` | `VARCHAR(64)` | `context_start`, `context_end`, `security_event` |
+| `metadata` | `TEXT` | JSON-encoded supplementary data |
+
+Trust escalation attempts (a caller supplying a higher trust level than its
+connector class declares) are detected in `from_connector()` and written as
+`security_event` rows in `execution_log`.
+
+### Integration Points
+
+All pipeline entry points create a context at startup:
+
+```python
+# gnat/ingest/pipeline.py
+class IngestPipeline:
+    def run(self, workspace_id: str, connector) -> IngestResult:
+        ctx = ExecutionContext.from_connector(connector, domain="ingestion",
+                                              workspace_id=workspace_id)
+        self._ctx_store.persist(ctx)
+        # ... pipeline body passes ctx through ...
+```
+
+```python
+# gnat/export/pipeline.py
+class ExportPipeline:
+    def run(self, workspace_id: str) -> ExportResult:
+        ctx = ExecutionContext.create(initiated_by="manual",
+                                      domain="reporting",
+                                      workspace_id=workspace_id)
+        self._ctx_store.persist(ctx)
+```
+
+Agent actions use `child()` to preserve the parent trace:
+
+```python
+# gnat/agents/research.py
+class ResearchAgent:
+    def run(self, parent_ctx: ExecutionContext, query: str):
+        ctx = parent_ctx.child(domain="analysis", initiated_by=self.agent_id)
+        self._ctx_store.persist(ctx)
+```
+
+---
+
+## Consequences
+
+### Positive
+
+- **Full traceability:** every operation, regardless of component, carries a
+  correlation ID linkable back to a parent chain in `execution_log`.
+- **Replay safety:** `is_replay=True` allows pipeline runners to re-run a
+  crashed job without firing SOAR triggers or creating duplicate enrichment
+  side effects.
+- **Trust propagation:** `trust_level` flows from connector declaration through
+  the pipeline to `ReasoningEngine` scoring without any caller needing to
+  re-derive it.
+- **Parent-child trace trees:** nested operations (agent spawning a connector
+  call) produce traceable parent-child trees queryable from `execution_log`.
+- **Budget enforcement:** `QueryBudget` prevents agent runaway without
+  requiring each connector to implement its own call counter.
+- **Zero new runtime dependencies:** `ExecutionContext` is a plain Python
+  dataclass; persistence uses the existing SQLAlchemy `[persist]` extra.
+
+### Negative / Trade-offs
+
+- **Caller discipline required:** every pipeline entry point must remember to
+  create and thread through the context; there is no automatic injection.
+  Connectors called directly (outside a pipeline) will not have a context
+  unless they construct one manually.
+- **Database write on every operation:** persisting context to `execution_log`
+  adds one `INSERT` per pipeline run.  High-frequency enrichment loops may
+  produce large log volumes; a retention policy is needed.
+- **Replay flag is advisory:** `is_replay=True` suppresses SOAR triggers only
+  in GNAT-internal components.  External webhooks reached before the context
+  was consulted are not automatically suppressed.
+
+### Deferred
+
+- Automatic context injection via a Python contextvars carrier (removes caller
+  discipline requirement for async code paths).
+- Streaming context events to an external observability backend (OpenTelemetry
+  trace export).
+- `execution_log` retention and archival policies.
+- Budget accounting UI in the TUI dashboard.
+
+---
+
+## Alternatives Considered
+
+### Thread-local context
+
+Storing the current `ExecutionContext` in a `threading.local()` variable would
+remove the need to pass it through every call site.  Rejected because GNAT
+supports both sync (`urllib3`) and async (`httpx`) code paths.
+`threading.local()` is invisible to `asyncio` tasks, so async connectors
+launched in the same event loop but different coroutines would silently inherit
+the wrong context or lose it entirely.
+
+### Decorator injection (`@with_context`)
+
+A class decorator that automatically wraps `authenticate()`, `get_object()`,
+etc. with context creation was prototyped.  Rejected because:
+1. It couples the decorator to the connector lifecycle, making it hard to use
+   `ExecutionContext` in non-connector code (agents, pipelines).
+2. It hides context creation from the caller, making replay control (setting
+   `is_replay=True`) harder to express.
+3. It does not support `child()` semantics where a parent context already
+   exists.
+
+### OpenTelemetry `Span` as the carrier
+
+Using `opentelemetry.trace.Span` directly as the execution carrier was
+considered.  Rejected because it would add a mandatory dependency on the
+`opentelemetry-api` package for every GNAT installation, even those that do
+not export traces.  `ExecutionContext` is a thin, dependency-free dataclass;
+OTel integration can be layered on top as a future extra.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md b/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md
new file mode 100644
index 00000000..b7e69b74
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0040-ADR-connector-trust-model.md
@@ -0,0 +1,293 @@
+# ADR-0040 — Connector Trust Level Classification
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT integrates with 99 distinct security and threat intelligence platforms.
+These connectors span a wide spectrum of data reliability and authority:
+
+- An internal SIEM (Splunk, Microsoft Sentinel, IBM QRadar) is operated by the
+  organisation itself; its indicators are authoritative by definition.
+- Commercial threat intelligence feeds (ThreatQ, Recorded Future, CrowdStrike)
+  are curated by professional analysts and carry strong but not absolute
+  reliability.
+- Community or public feeds (AlienVault OTX, Shadowserver, CISA KEV) are
+  maintained by volunteers or government bodies; data quality varies widely and
+  indicators may be stale or incorrectly attributed.
+
+Prior to this ADR every connector carried **equal implicit trust**.  The
+enrichment dispatcher treated a hit from AlienVault OTX identically to a hit
+from the organisation's own Splunk deployment.  The `ReasoningEngine`
+introduced in Phase 4C (see ADR-0044) needed a **stable, declarative source of
+trust authority** to compute trust-weighted scores without requiring each call
+site to re-derive trust from the connector's identity.
+
+Three requirements drove the design:
+
+1. **Declarative, not runtime-computed:** trust level must be a class-level
+   constant that static analysis tools and policy agents can inspect without
+   instantiating a connector.
+2. **Propagatable:** trust must flow automatically from connector declaration
+   into `ExecutionContext` (ADR-0039) and from there into `ReasoningEngine`
+   scoring (ADR-0044).
+3. **Auditable:** attempts to escalate trust above what a connector class
+   declares must be detected and logged.
+
+---
+
+## Decision
+
+### Class Variable on `BaseClient`
+
+Add a single class variable to `gnat/clients/base.py`:
+
+```python
+class BaseClient:
+    """Base HTTP client for all GNAT connectors."""
+
+    # Trust level for data produced by this connector.
+    # Subclasses MUST override this if they are not semi-trusted.
+    TRUST_LEVEL: str = "semi_trusted"
+```
+
+Every concrete connector subclass overrides `TRUST_LEVEL` to one of three
+enumerated string constants defined in `gnat/core/trust.py`:
+
+```python
+TRUSTED_INTERNAL    = "trusted_internal"
+SEMI_TRUSTED        = "semi_trusted"
+UNTRUSTED_EXTERNAL  = "untrusted_external"
+```
+
+### Classification Assignments
+
+The following table shows the trust assignment for all 99 connectors.
+Connectors not listed below carry the default `semi_trusted` level.
+
+#### `trusted_internal`
+
+These connectors represent data that is operated, controlled, and
+authoritative within the customer's own environment.
+
+| Connector | Module | Rationale |
+|-----------|--------|-----------|
+| Splunk | `gnat/connectors/splunk/` | Internal SIEM; customer-operated |
+| Microsoft Sentinel | `gnat/connectors/sentinel/` | Internal cloud SIEM |
+| IBM QRadar | `gnat/connectors/qradar/` | Internal SIEM |
+| Elastic SIEM | `gnat/connectors/elastic/` | Internal SIEM/XDR |
+| Graylog | `gnat/connectors/graylog/` | Internal log aggregation |
+| Security Onion | `gnat/connectors/security_onion/` | Internal NSM/SIEM |
+| Wazuh | `gnat/connectors/wazuh/` | Internal SIEM/XDR |
+| Palo Alto XSOAR | `gnat/connectors/xsoar/` | Internal SOAR orchestrator |
+
+#### `semi_trusted`
+
+Professional, commercially-operated or well-established open-source platforms
+whose data quality is high but not self-certified.
+
+| Connector | Module | Rationale |
+|-----------|--------|-----------|
+| ThreatQ | `gnat/connectors/threatq/` | Commercial TIP with curation |
+| CrowdStrike Falcon | `gnat/connectors/crowdstrike/` | Commercial EDR/TI |
+| Recorded Future | `gnat/connectors/recordedfuture/` | Commercial TI |
+| Feedly | `gnat/connectors/feedly/` | Curated commercial feed |
+| VirusTotal | `gnat/connectors/virustotal/` | Commercial multi-scanner |
+| MISP | `gnat/connectors/misp/` | Open-source TIP, community-vetted |
+| Mandiant Advantage | `gnat/connectors/mandiant/` | Commercial TI |
+| Flashpoint | `gnat/connectors/flashpoint/` | Commercial dark-web TI |
+| Intel 471 | `gnat/connectors/intel471/` | Commercial cybercrime TI |
+| Group-IB | `gnat/connectors/group_ib/` | Commercial TI |
+| Anomali ThreatStream | `gnat/connectors/threatstream/` | Commercial TIP |
+| ThreatConnect | `gnat/connectors/threatconnect/` | Commercial TIP |
+
+All remaining connectors not listed in the trusted_internal or
+untrusted_external sections default to `semi_trusted` at the `BaseClient`
+level.
+
+#### `untrusted_external`
+
+Community-contributed, public, or government feeds where quality control is
+limited or the submission model is open.
+
+| Connector | Module | Rationale |
+|-----------|--------|-----------|
+| AlienVault OTX | `gnat/connectors/alienvault/` | Open community submissions |
+| Shadowserver Foundation | `gnat/connectors/shadowserver/` | Public; quality varies by dataset |
+| CISA KEV | `gnat/connectors/cisa/` | Government advisory; no auth; coverage gaps |
+| PulseDive | `gnat/connectors/pulsedive/` | Community-aggregated |
+| GreyNoise | `gnat/connectors/greynoise/` | Mass-scanner data; noisy by design |
+| Have I Been Pwned | `gnat/connectors/hibp/` | Breach aggregate; no attribution |
+| Hudson Rock | `gnat/connectors/hudsonrock/` | Breach intelligence; community-sourced |
+
+### Example Overrides
+
+```python
+# gnat/connectors/splunk/client.py
+class SplunkClient(BaseClient):
+    TRUST_LEVEL = "trusted_internal"
+
+# gnat/connectors/alienvault/client.py
+class AlienVaultClient(BaseClient):
+    TRUST_LEVEL = "untrusted_external"
+
+# gnat/connectors/threatq/client.py
+class ThreatQClient(BaseClient):
+    TRUST_LEVEL = "semi_trusted"  # explicit; same as default but self-documenting
+```
+
+### Integration with `ExecutionContext`
+
+`ExecutionContext.from_connector()` (ADR-0039) reads `TRUST_LEVEL` via the
+class, not the instance, so it is available before authentication:
+
+```python
+@classmethod
+def from_connector(
+    cls,
+    connector: BaseClient,
+    domain: str,
+    workspace_id: str,
+    policy_set: str | None = None,
+    budget: QueryBudget | None = None,
+) -> "ExecutionContext":
+    declared_trust = type(connector).TRUST_LEVEL
+    return cls(
+        context_id=uuid4(),
+        initiated_by=type(connector).__module__.split(".")[-2],
+        domain=domain,
+        trust_level=declared_trust,
+        policy_set=policy_set,
+        workspace_id=workspace_id,
+        created_at=datetime.utcnow(),
+        parent_context_id=None,
+        is_replay=False,
+        budget=budget,
+    )
+```
+
+### Trust Escalation Detection
+
+If a caller constructs an `ExecutionContext` manually and supplies a
+`trust_level` higher than the connector class declares, the mismatch is
+detected in `ExecutionContext.from_connector()` and written as a
+`security_event` row to `execution_log`:
+
+```python
+if requested_trust != declared_trust:
+    _log_security_event(
+        event="trust_escalation_attempt",
+        connector=type(connector).__name__,
+        declared=declared_trust,
+        requested=requested_trust,
+        workspace_id=workspace_id,
+    )
+    # requested_trust is ignored; declared_trust is used
+```
+
+### Trust Weight Mapping
+
+The trust level string maps to a numeric weight used by `ReasoningEngine`
+(ADR-0044):
+
+| Trust Level | Weight |
+|-------------|--------|
+| `trusted_internal` | 0.9 |
+| `semi_trusted` | 0.6 |
+| `untrusted_external` | 0.3 |
+
+The mapping is defined in `gnat/core/trust.py` as `TRUST_WEIGHTS: dict[str, float]`
+and shared between `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine`
+to ensure a single source of truth.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Declarative and inspectable:** `TRUST_LEVEL` is a class constant that can
+  be read by policy agents, linters, and documentation generators without
+  instantiating a connector or making any network call.
+- **Zero runtime cost:** reading a class variable adds no overhead compared to
+  the HTTP call that follows.
+- **Automatic propagation:** once set on the class, trust flows into
+  `ExecutionContext`, `HypothesisEngine`, and `ReasoningEngine` without any
+  additional caller configuration.
+- **Auditable escalation:** any attempt to override the declared trust level is
+  logged before being silently rejected; the declared level always wins.
+- **No breaking changes:** the default (`semi_trusted`) means existing
+  connectors that have not yet been classified behave identically to the
+  pre-ADR behaviour.
+
+### Negative / Trade-offs
+
+- **Static classification:** trust level is a class constant, not a
+  runtime-configurable value.  An operator who has additional context (e.g.
+  "our OTX subscription is curated by an analyst") cannot elevate a connector's
+  trust without modifying source code or subclassing.
+- **Binary per connector:** trust is assigned at the connector level, not at
+  the dataset or indicator level.  A connector that mixes high- and low-quality
+  data (e.g. VirusTotal community vs. premium API hits) cannot express that
+  distinction through `TRUST_LEVEL` alone; per-object tagging (deferred) is
+  needed for that.
+- **Classification maintenance:** as new connectors are added, the platform
+  team must consciously assign a trust level; the default `semi_trusted` acts
+  as a safe backstop but may be too conservative or too permissive depending on
+  context.
+
+### Deferred
+
+- **Operator-configurable trust override:** allow operators to raise or lower a
+  connector's effective trust via the INI config file (e.g.
+  `[alienvault] trust_override = semi_trusted`) without modifying source code.
+- **Per-object trust tags:** complement connector-level trust with
+  indicator-level confidence tags derived from raw connector metadata (e.g.
+  VirusTotal detection ratio, MISP event distribution level).
+- **Dynamic trust scoring:** a future `TrustCalibrationAgent` could observe
+  long-term accuracy of indicators per connector and automatically adjust trust
+  weights; this is deferred pending training data collection.
+
+---
+
+## Alternatives Considered
+
+### Per-object trust tags at ingest time
+
+Rather than a connector-level class constant, each mapper could attach a trust
+tag to every `STIXBase` object it produces.  Rejected because:
+
+1. Every mapper author would need to decide on trust independently, leading to
+   inconsistency.
+2. Mappers do not always have access to the connector identity at call time.
+3. The per-object approach does not express *source authority* — the question of
+   "how much do I trust this platform in general?" is separate from "how
+   confident is this individual indicator?" and both are needed.
+
+### Dynamic trust scoring based on historical accuracy
+
+A scoring model that adjusts trust weights based on observed true-positive rates
+per connector was considered.  Deferred (not rejected) because it requires
+several months of labelled ground-truth data that does not yet exist.  The
+static classification in this ADR will serve as the training baseline once
+collection begins.
+
+### INI-file trust assignment
+
+Defining trust levels in `config.ini` rather than as class constants was
+considered.  Rejected for the initial implementation because:
+
+1. It would require a running config loader before any connector can be
+   classified, making static analysis and documentation generation more complex.
+2. Class constants are self-documenting in the source tree and version-controlled
+   alongside the connector code.
+3. Operator overrides via INI are deferred work and can be layered on top of the
+   class-constant baseline without replacing it.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md b/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md
new file mode 100644
index 00000000..7ed91492
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0041-ADR-idempotency-schema-evolution.md
@@ -0,0 +1,325 @@
+# ADR-0041 — Idempotency and ORM Schema Versioning
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+### The Replay Problem
+
+GNAT pipelines are long-running processes that may be interrupted mid-flight:
+network partitions, database deadlocks, container restarts, and operator
+`SIGINT` are all common causes.  When a pipeline is restarted, it must be safe
+to replay from the beginning without producing:
+
+- **Duplicate STIX objects** in the workspace store (violating the uniqueness
+  contract of STIX IDs per platform).
+- **Double SOAR triggers** (sending the same alert to a SOAR platform twice).
+- **Duplicate enrichment calls** (wasting API quota on already-processed
+  indicators).
+
+Prior to this ADR, GNAT had no pipeline-level idempotency mechanism.  Connector
+code performed ad-hoc checks ("does this STIX ID already exist?") but these
+were inconsistent and did not cover all write paths.  A crashed ingest run that
+had completed 800 of 1,000 records before failing would, on restart, attempt to
+re-process all 1,000 records and fail on uniqueness constraints in the ORM
+layer.
+
+### The Schema Evolution Problem
+
+The STIX 2.1 ORM (ADR-0031, ADR-0032) uses a property-bag pattern:
+`STIXBase._properties` stores all non-core fields as an untyped dict.  When a
+breaking change is made to a field (e.g. `threat_score: float` renamed to
+`confidence: float`, or a field's semantics change such that old serialised
+values are incorrect), there is no mechanism to detect that persisted objects
+were produced by an older version of the ORM and need migration.
+
+Two independent deployment scenarios require schema versioning:
+
+1. **Rolling upgrades:** a GNAT worker is upgraded to a new version while the
+   workspace database still contains objects serialised by the previous version.
+2. **Test isolation:** fixture factories in `tests/` need to produce objects that
+   match the current schema without coupling to specific field values.
+
+---
+
+## Decision
+
+### Part 1: Idempotency Keys
+
+#### Key Format
+
+Every write to the workspace store is gated by an idempotency key computed
+by `WorkspaceStore.make_idempotency_key()`:
+
+```
+{connector_id}:{stix_type}:{external_id}:{sha1_content_hash[:12]}
+```
+
+- **`connector_id`** — the connector's module name (e.g. `crowdstrike`,
+  `alienvault`).  Scopes the key to a source; the same external ID from two
+  different connectors does not collide.
+- **`stix_type`** — the STIX object type string (e.g. `indicator`,
+  `threat-actor`).
+- **`external_id`** — the platform-native identifier for the object (e.g. a
+  ThreatQ indicator ID, a CrowdStrike IOC value).  If unavailable, the STIX
+  `id` field is used.
+- **`sha1_content_hash[:12]`** — first 12 hex characters of the SHA-1 digest of
+  the object's canonical JSON representation (keys sorted, no whitespace).
+  Detects content changes even when the external ID is stable.
+
+```python
+import hashlib, json
+
+def make_idempotency_key(
+    connector_id: str,
+    stix_obj: STIXBase,
+    external_id: str | None = None,
+) -> str:
+    ext = external_id or stix_obj.id
+    payload = json.dumps(stix_obj.to_dict(), sort_keys=True, separators=(",", ":"))
+    content_hash = hashlib.sha1(payload.encode()).hexdigest()[:12]
+    return f"{connector_id}:{stix_obj.type}:{ext}:{content_hash}"
+```
+
+#### Database Storage
+
+The idempotency key is stored as a `VARCHAR(255)` column on the
+`workspace_objects` table, introduced via Alembic migration
+`0005_add_idempotency_key.py`:
+
+```sql
+ALTER TABLE workspace_objects
+  ADD COLUMN idempotency_key VARCHAR(255);
+
+CREATE UNIQUE INDEX uix_workspace_objects_idempotency
+  ON workspace_objects (workspace_id, idempotency_key)
+  WHERE idempotency_key IS NOT NULL;
+```
+
+The partial unique index (`WHERE idempotency_key IS NOT NULL`) ensures that
+objects written by code paths that pre-date this ADR (which will have
+`NULL` keys) are not incorrectly flagged as duplicates.
+
+#### Write Path
+
+`WorkspaceStore.upsert()` now follows this sequence:
+
+```python
+def upsert(
+    self,
+    stix_obj: STIXBase,
+    ctx: ExecutionContext,
+    external_id: str | None = None,
+) -> UpsertResult:
+    key = make_idempotency_key(ctx.initiated_by, stix_obj, external_id)
+    existing = self._session.query(WorkspaceObjectModel)\
+        .filter_by(workspace_id=ctx.workspace_id, idempotency_key=key)\
+        .first()
+
+    if existing:
+        _log_to_execution_log(ctx, event_type="idempotent_skip", key=key)
+        return UpsertResult(skipped=True, object_id=existing.stix_id)
+
+    # ... proceed with INSERT ...
+    return UpsertResult(skipped=False, object_id=stix_obj.id)
+```
+
+`UpsertResult` is a small dataclass with `skipped: bool` and `object_id: str`.
+Callers that need to distinguish new writes from idempotent skips (e.g. pipeline
+progress reporters) can inspect `result.skipped`.
+
+#### Replay Integration
+
+When `ExecutionContext.is_replay` is `True`, idempotent skips are still
+performed (preventing duplicate writes) but the skip is recorded with
+`event_type="replay_skip"` in `execution_log` rather than
+`"idempotent_skip"`.  This allows operators to distinguish between "normal
+deduplication" and "replay recovery" in audit queries:
+
+```sql
+-- Count objects successfully replayed vs. newly written
+SELECT event_type, COUNT(*) FROM execution_log
+WHERE context_id = :replay_context_id
+GROUP BY event_type;
+```
+
+SOAR triggers and external webhook emissions are suppressed when
+`ctx.is_replay` is `True`, regardless of whether the write was skipped.
+
+### Part 2: ORM Schema Versioning
+
+#### `schema_version` Class Variable
+
+`STIXBase` gains a class variable:
+
+```python
+class STIXBase:
+    """Base class for all GNAT STIX ORM objects."""
+
+    schema_version: int = 1
+    """
+    Monotonically increasing integer.  Increment only on breaking field changes.
+    Additive changes (new optional fields) do not require a bump.
+    """
+```
+
+Subclasses override `schema_version` when they introduce a breaking change:
+
+```python
+class STIXIndicator(STIXBase):
+    schema_version: int = 2  # bumped when 'threat_score' was renamed 'confidence'
+```
+
+#### Serialisation
+
+`STIXBase.to_dict()` includes `schema_version` in its output:
+
+```python
+def to_dict(self) -> dict:
+    return {
+        "type": self.type,
+        "id": self.id,
+        "schema_version": self.schema_version,
+        **self._properties,
+    }
+```
+
+#### Deserialisation and Migration
+
+`STIXBase.from_dict()` reads the `schema_version` from the serialised payload
+and, if it differs from the current class's `schema_version`, invokes the
+registered migration chain:
+
+```python
+@classmethod
+def from_dict(cls, data: dict) -> "STIXBase":
+    stored_version = data.get("schema_version", 1)
+    current_version = cls.schema_version
+    if stored_version < current_version:
+        data = _apply_migrations(cls, data, stored_version, current_version)
+    obj = cls.__new__(cls)
+    # ... populate fields from data ...
+    return obj
+```
+
+Migration functions are registered per class in
+`gnat/orm/migrations.py` using a simple decorator:
+
+```python
+@schema_migration(STIXIndicator, from_version=1, to_version=2)
+def _migrate_indicator_v1_to_v2(data: dict) -> dict:
+    # Rename 'threat_score' to 'confidence'
+    if "threat_score" in data:
+        data["confidence"] = data.pop("threat_score")
+    return data
+```
+
+#### Version Bump Policy
+
+| Change type | Version bump? |
+|-------------|---------------|
+| Add a new optional field | No |
+| Add a new required field with a default value | No |
+| Remove a field | Yes |
+| Rename a field | Yes |
+| Change a field's type or semantics | Yes |
+| Add a new method (no field impact) | No |
+
+This policy keeps the version number low and stable for the common additive
+case while ensuring that breaking changes are detectable.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Pipelines are fully idempotent:** restarting a crashed ingest job from the
+  beginning is safe; already-written objects are skipped cleanly without
+  database constraint violations or duplicate STIX IDs.
+- **Replay is auditable:** `execution_log` records `replay_skip` events
+  separately from normal `idempotent_skip` events, enabling operators to measure
+  recovery completeness.
+- **SOAR trigger safety:** `is_replay` suppression prevents double-alerting
+  even when a replay re-processes objects that were already written in a prior
+  partial run.
+- **Schema evolution is controlled:** `schema_version` makes breaking field
+  changes detectable and migrateable; additive changes do not require a bump,
+  keeping the version number stable for routine development.
+- **No new storage tables:** idempotency keys are a column on the existing
+  `workspace_objects` table; schema versions are serialised into the existing
+  JSON payload.  No additional infrastructure is required.
+
+### Negative / Trade-offs
+
+- **Key computation cost:** SHA-1 of the canonical JSON is computed on every
+  write, adding ~0.1 ms per object on a typical developer machine.  At 10,000
+  objects per ingest run this is ~1 second, acceptable for the safety guarantee.
+- **Partial index coverage:** objects written before migration `0005` have
+  `NULL` idempotency keys and are not protected by idempotency.  A backfill job
+  can populate keys for existing objects but is not automated.
+- **Migration chain maintenance:** as `schema_version` grows, the migration
+  chain from version 1 to the current version must be maintained.  A test in
+  `tests/unit/orm/test_schema_migrations.py` validates every registered
+  migration in sequence.
+- **Content-hash sensitivity:** if two connectors produce the same indicator
+  with different metadata (e.g. different `labels` lists), the content hash
+  differs and both are stored as distinct objects.  This is correct behaviour
+  but may surprise operators who expect connector-level deduplication.
+
+### Deferred
+
+- **Backfill job** for populating idempotency keys on pre-migration objects.
+- **Key expiry policy:** idempotency keys for objects deleted from the workspace
+  should be cleaned up to prevent key exhaustion in very long-running
+  deployments.
+- **Cross-workspace deduplication:** the current scheme deduplicates within a
+  single `workspace_id`; cross-workspace deduplication (e.g. between a staging
+  and production workspace) is out of scope.
+- **ORM migration CLI command:** `gnat orm migrate --dry-run` to preview
+  pending migrations before a deployment.
+
+---
+
+## Alternatives Considered
+
+### Content-addressed storage (STIX ID as primary key)
+
+STIX IDs are already unique per platform: a STIX indicator with a given ID from
+CrowdStrike is always the same logical object.  Using the STIX ID as the sole
+uniqueness key was considered as an alternative to a separate idempotency key.
+
+Rejected because:
+
+1. The same logical indicator can arrive from multiple connectors with different
+   STIX IDs (each connector may assign its own UUID-based ID) but the same
+   content.  STIX ID uniqueness does not prevent cross-connector duplicates.
+2. STIX IDs do not capture content changes: a connector may reassign the same
+   ID to an updated indicator.  The content hash component of the idempotency
+   key detects this case and allows the update through.
+
+### Alembic-only schema versioning
+
+Using Alembic migrations exclusively to manage ORM field changes was considered.
+Alembic tracks database schema changes (table columns, indexes) but does not
+address ORM-level field renames or semantic changes that are expressed in the
+JSON property bag.  Alembic is still used for database schema changes
+(migration `0005`); `schema_version` complements it by covering the ORM object
+layer that Alembic cannot reach.
+
+### Event sourcing for idempotency
+
+An event-sourced store where every write is an event and idempotency is
+guaranteed by event log position was considered.  Rejected because it would
+require a fundamental redesign of the workspace store and all connectors,
+displacing the existing `workspace_objects` table and the established connector
+contract (ADR-0031).  Event sourcing remains a long-term architectural option
+if GNAT grows to require it.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md b/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md
new file mode 100644
index 00000000..212e5837
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0042-ADR-hypothesis-engine.md
@@ -0,0 +1,378 @@
+# ADR-0042 — Hypothesis Testing Engine (Phase 4C)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+Threat intelligence analysis is fundamentally a hypothesis-driven activity.
+An analyst observing a cluster of suspicious indicators might form the hypothesis
+"192.0.2.1 is a Lazarus Group command-and-control server" and then accumulate
+evidence for or against that claim over days or weeks.
+
+Prior to this ADR, GNAT had no structured mechanism for tracking hypotheses.
+Analysts recorded their assessments as free-text investigation notes, which
+meant:
+
+- **No machine-readable lifecycle:** hypotheses could not transition through
+  `pending → confirmed → refuted` states in a way that downstream systems
+  (SOAR, reporting) could act on.
+- **No evidence linkage:** supporting or refuting observations were stored as
+  narrative text rather than as typed STIX relationship references, making it
+  impossible to audit the evidence chain.
+- **No automated corroboration:** the Solr search index (ADR-0028 derivative)
+  accumulated relevant hits but nothing queried it on a hypothesis's behalf.
+- **No confidence tracking:** a hypothesis's confidence was not updated as new
+  evidence arrived; analysts had to manually re-read all notes to reassess.
+
+The `ReasoningEngine` (ADR-0044) needed a structured hypothesis type to feed
+into its scoring pipeline, and the `HypothesisEngine` itself needed a home in
+the GNAT architecture that was consistent with the existing STIX custom object
+pattern (ADR-0032).
+
+---
+
+## Decision
+
+### `STIXHypothesis` Custom SDO
+
+A new custom STIX Domain Object is defined in
+`gnat/stix/sdos/hypothesis.py`:
+
+```python
+@dataclass
+class STIXHypothesis(STIXBase):
+    """
+    x-gnat-hypothesis — STIX custom SDO for analyst hypotheses.
+
+    Represents a structured claim about a threat actor, campaign, or observable
+    that can be confirmed, refuted, or left inconclusive by accumulated evidence.
+    """
+
+    type: str = "x-gnat-hypothesis"
+    schema_version: int = 1
+
+    # Core fields
+    statement: str = ""                    # Natural-language hypothesis text
+    confidence: float = 0.2               # [0.0, 1.0]; updated by evaluate()
+    status: str = "pending"               # pending | confirmed | refuted | inconclusive
+
+    # Evidence arrays — store STIX relationship IDs
+    supporting_evidence: list[str] = field(default_factory=list)
+    refuting_evidence: list[str] = field(default_factory=list)
+
+    # Provenance
+    created_by: str = ""                   # initiated_by from the creating ExecutionContext
+    workspace_id: str = ""
+    created_at: datetime | None = None
+    last_evaluated_at: datetime | None = None
+```
+
+`STIXHypothesis` is registered in `gnat/stix/sdos/__init__.py` alongside
+other custom SDOs (`x-gnat-report-summary`, `x-gnat-enrichment-log`).
+
+Evidence is stored as STIX relationship IDs (strings matching the STIX
+`relationship--<uuid>` pattern) rather than direct STIX IDs so that the
+evidence relationship itself carries the semantic link (e.g.
+`relationship_type: "supports"` or `relationship_type: "refutes"`).
+
+#### Status State Machine
+
+```
+              propose()
+               ───────►  pending
+                              │
+                   evaluate() │
+                 ┌────────────┤
+                 │            │
+      confidence ≥ 0.75       │  0.15 < confidence < 0.75
+                 │            │
+                 ▼            ▼            confidence ≤ 0.15
+             confirmed    (unchanged)      AND refuting_evidence
+                                               ───────────────►  refuted
+                              │
+                  close(verdict) │
+                         ───────►  inconclusive (when verdict == "inconclusive")
+```
+
+### `HypothesisEngine`
+
+`gnat/reasoning/hypothesis.py` provides the lifecycle manager:
+
+```python
+class HypothesisEngine:
+    """
+    Manages the full lifecycle of STIXHypothesis objects:
+    propose → evaluate → close.
+    """
+
+    def __init__(
+        self,
+        store: WorkspaceStore,
+        search_index: SearchIndex,  # SolrSearchIndex or NullSearchIndex
+        trust_weights: dict[str, float] | None = None,
+    ) -> None:
+        self._store = store
+        self._search = search_index
+        self._weights = trust_weights or TRUST_WEIGHTS  # from gnat.core.trust
+```
+
+#### `propose()`
+
+Creates and persists a new `STIXHypothesis` in the workspace:
+
+```python
+def propose(
+    self,
+    statement: str,
+    initial_evidence: list[str],
+    ctx: ExecutionContext,
+    confidence: float = 0.2,
+) -> STIXHypothesis:
+    """
+    Parameters
+    ----------
+    statement : str
+        Natural-language hypothesis text (e.g. "192.0.2.1 is Lazarus C2").
+    initial_evidence : list[str]
+        STIX relationship IDs linking the hypothesis to supporting objects.
+    ctx : ExecutionContext
+        Execution context; workspace_id and initiated_by are taken from here.
+    confidence : float
+        Initial confidence score in [0.0, 1.0].  Defaults to 0.2 (weak prior).
+
+    Returns
+    -------
+    STIXHypothesis
+        The persisted hypothesis object.
+    """
+    hyp = STIXHypothesis(
+        id=f"x-gnat-hypothesis--{uuid4()}",
+        statement=statement,
+        confidence=confidence,
+        status="pending",
+        supporting_evidence=list(initial_evidence),
+        refuting_evidence=[],
+        created_by=ctx.initiated_by,
+        workspace_id=ctx.workspace_id,
+        created_at=datetime.utcnow(),
+    )
+    self._store.upsert(hyp, ctx)
+    return hyp
+```
+
+#### `evaluate()`
+
+Queries Solr for corroborating or refuting evidence and updates confidence:
+
+```python
+def evaluate(
+    self,
+    hypothesis_id: str,
+    ctx: ExecutionContext,
+) -> STIXHypothesis:
+    """
+    Re-scores a hypothesis by querying the Solr search index for evidence
+    corroborating or refuting its statement, then updates its confidence
+    and (if thresholds are crossed) its status.
+    """
+    hyp = self._store.get(hypothesis_id, STIXHypothesis)
+
+    # 1. Solr full-text query on the hypothesis statement
+    hits = self._search.query(hyp.statement, fields=["name", "pattern", "description"])
+
+    # 2. Weight each hit by the trust level of its source connector
+    weighted_sum = 0.0
+    for hit in hits:
+        trust = hit.get("source_trust_level", "semi_trusted")
+        weighted_sum += self._weights.get(trust, 0.6)
+
+    # 3. Normalise to [0.0, 1.0]
+    raw_corroboration = min(weighted_sum / max(len(hits), 1), 1.0)
+
+    # 4. Blend with the existing confidence (Bayesian-inspired update)
+    new_confidence = 0.4 * hyp.confidence + 0.6 * raw_corroboration
+    new_confidence = round(max(0.0, min(1.0, new_confidence)), 4)
+
+    # 5. Auto-classify
+    new_status = hyp.status
+    if new_confidence >= 0.75:
+        new_status = "confirmed"
+    elif new_confidence <= 0.15 and hyp.refuting_evidence:
+        new_status = "refuted"
+
+    hyp.confidence = new_confidence
+    hyp.status = new_status
+    hyp.last_evaluated_at = datetime.utcnow()
+    self._store.upsert(hyp, ctx)
+    return hyp
+```
+
+**Confidence scoring weights by trust level:**
+
+| Source Trust Level | Weight Used in Corroboration |
+|--------------------|------------------------------|
+| `trusted_internal` | 0.9 |
+| `semi_trusted` | 0.6 |
+| `untrusted_external` | 0.3 |
+
+**Auto-classification thresholds:**
+
+| Condition | New Status |
+|-----------|-----------|
+| `confidence ≥ 0.75` | `confirmed` |
+| `confidence ≤ 0.15` AND `refuting_evidence` non-empty | `refuted` |
+| Neither threshold met | Unchanged (remains `pending`) |
+
+#### `close()`
+
+Locks the hypothesis with a final analyst verdict:
+
+```python
+def close(
+    self,
+    hypothesis_id: str,
+    verdict: str,  # "confirmed" | "refuted" | "inconclusive"
+    ctx: ExecutionContext,
+) -> STIXHypothesis:
+    """
+    Closes a hypothesis with a final analyst-provided verdict.
+    Closed hypotheses are not eligible for further evaluate() calls.
+    """
+    if verdict not in ("confirmed", "refuted", "inconclusive"):
+        raise ValueError(f"Invalid verdict: {verdict!r}")
+    hyp = self._store.get(hypothesis_id, STIXHypothesis)
+    if hyp.status in ("confirmed", "refuted", "inconclusive"):
+        raise HypothesisAlreadyClosedError(hypothesis_id)
+    hyp.status = verdict
+    hyp.last_evaluated_at = datetime.utcnow()
+    self._store.upsert(hyp, ctx)
+    return hyp
+```
+
+### Evidence Linkage via STIX Relationships
+
+When an analyst (or an automated pipeline) identifies a STIX object that
+supports or refutes a hypothesis, a STIX `relationship` is created linking the
+two objects and the relationship ID is appended to the appropriate evidence list:
+
+```python
+# Analyst adds supporting evidence
+rel = STIXRelationship(
+    relationship_type="supports",
+    source_ref=suspicious_ip.id,
+    target_ref=hyp.id,
+)
+workspace.upsert(rel, ctx)
+hyp.supporting_evidence.append(rel.id)
+engine.evaluate(hyp.id, ctx)  # re-score with new evidence
+```
+
+This approach means that every piece of evidence is a first-class STIX object,
+queryable, exportable as a STIX bundle, and auditable via the lineage tracker
+(ADR-0038).
+
+### Storage
+
+`STIXHypothesis` is persisted via the existing `WorkspaceStore.upsert()`
+mechanism.  No new database tables are required; the object lands in
+`workspace_objects` like any other STIX object.  The idempotency key
+(ADR-0041) ensures that `evaluate()` calls updating the same hypothesis do
+not create duplicate rows.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Structured hypothesis lifecycle:** hypotheses transition through a defined
+  state machine (`pending → confirmed/refuted/inconclusive`) rather than
+  existing only in analyst notes; downstream SOAR and reporting systems can
+  filter by `status`.
+- **Evidence provenance:** every piece of supporting or refuting evidence is a
+  typed STIX relationship, exportable as a STIX bundle and auditable via the
+  lineage tracker.
+- **Automated corroboration:** `evaluate()` queries Solr without analyst
+  intervention, updating confidence as new indicators arrive in the workspace.
+- **Trust-weighted scoring:** evidence from internal SIEMs carries more weight
+  than community feeds; this is not configurable per-call (it uses the shared
+  `TRUST_WEIGHTS` constant) ensuring consistent behaviour across all hypotheses.
+- **No new infrastructure:** `STIXHypothesis` uses the existing workspace store
+  and Solr search index; no new tables, queues, or services are required.
+- **Graceful Solr degradation:** if Solr is unavailable, `NullSearchIndex` is
+  substituted and `evaluate()` returns the hypothesis unchanged (confidence not
+  updated) rather than raising.
+
+### Negative / Trade-offs
+
+- **Solr dependency for corroboration:** `evaluate()` is only useful when the
+  Solr sidecar is running.  Deployments without Solr get lifecycle management
+  (`propose`, `close`) but not automated corroboration.
+- **Statement-based Solr query:** Solr is queried with the raw hypothesis
+  statement string.  If the statement uses phrasing that does not match indexed
+  field content, corroboration scores will be low even when strong evidence
+  exists.  Structured query decomposition (NLP-based entity extraction) is
+  deferred.
+- **No real-time push:** `evaluate()` is called on demand or on a schedule; it
+  does not automatically fire when a new indicator arrives in the workspace.
+  A watcher pattern (deferred) would close this gap.
+- **Confidence blending is heuristic:** the 40/60 blend of existing and new
+  confidence is not derived from a formal Bayesian model; it is a pragmatic
+  approximation that may need tuning.
+
+### Deferred
+
+- **Scheduled re-evaluation:** a `HypothesisWatcher` job that calls `evaluate()`
+  on all `pending` hypotheses when new objects are ingested into the same
+  workspace.
+- **NLP-based entity extraction:** decompose the hypothesis statement into
+  structured entity queries (IP, domain, actor name) before querying Solr to
+  improve corroboration recall.
+- **STIX 2.1 Opinion SDO integration:** map `STIXHypothesis` closed verdicts
+  to native STIX 2.1 `opinion` objects for maximum interoperability.
+- **Multi-analyst collaboration:** allow multiple analysts to propose competing
+  verdicts on the same hypothesis and surface disagreements.
+
+---
+
+## Alternatives Considered
+
+### Free-text analyst notes
+
+Keeping hypotheses as free-text entries in investigation notes was the simplest
+option and required no new code.  Rejected because:
+
+1. Notes are not machine-readable; SOAR and reporting systems cannot filter on
+   `status == "confirmed"`.
+2. Evidence linkage is lost; the note references the evidence by name but not
+   by STIX ID, breaking the audit chain.
+3. Confidence is not tracked; analysts must manually re-assess every note when
+   new evidence arrives.
+
+### External hypothesis management tools (e.g. Jupyter notebooks, Jira)
+
+Using an external tool (Jira tickets, Jupyter analysis notebooks) to track
+hypotheses was considered.  Rejected because it breaks GNAT's single-data-model
+principle: all threat intelligence objects should be representable in STIX and
+stored in the workspace.  An external tool would require a synchronisation
+bridge and would not benefit from Solr corroboration, lineage tracking, or the
+`ReasoningEngine` scoring pipeline.
+
+### Native STIX 2.1 `opinion` SDO
+
+STIX 2.1 includes an `opinion` SDO that expresses an assessment about the
+correctness of STIX content.  Using `opinion` directly was considered.  Rejected
+because `opinion` has a fixed enumerated value set
+(`strongly-disagree` to `strongly-agree`) and no fields for a natural-language
+statement, a confidence score, or an evidence list.  `STIXHypothesis`
+(`x-gnat-hypothesis`) extends the STIX custom object pattern consistently with
+ADR-0032 and can produce an `opinion` on `close()` as a derived output
+(deferred).
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md b/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md
new file mode 100644
index 00000000..6c886273
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0043-ADR-negative-evidence.md
@@ -0,0 +1,364 @@
+# ADR-0043 — Negative Evidence Tracking (Phase 4C)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+When GNAT enriches a STIX observable (e.g. an IP address, a domain, a file
+hash) it queries one or more connectors to retrieve additional context.  If a
+connector returns no results for a given observable, that **absence of data is
+itself intelligence**:
+
+- If VirusTotal has never seen a particular file hash, that is meaningful.
+- If CrowdStrike Falcon has no record of a domain, that reduces the likelihood
+  that the domain is a known threat actor infrastructure.
+- If Recorded Future has no intelligence on an IP address, that is different
+  from "we have not checked yet."
+
+Prior to this ADR, GNAT did not record negative results.  Every enrichment
+request was treated as if no prior query had been made.  This created two
+compounding problems:
+
+### Problem 1: Redundant API calls
+
+The enrichment dispatcher re-queried every connector for every observable on
+every pipeline run, regardless of whether the same lookup had already returned
+nothing.  In a typical production deployment:
+
+- 10,000 observables × 5 connectors = 50,000 queries per pipeline run
+- If 60% of queries return no results, roughly 30,000 of those calls are wasted
+  against connectors that already had nothing to say
+- Many commercial connectors enforce rate limits (e.g. VirusTotal: 500 requests
+  per minute on the free tier); wasted calls exhaust quota that could serve
+  novel indicators
+
+### Problem 2: No negative signal in scoring
+
+The `ReasoningEngine` (ADR-0044) scores observables using trust-weighted
+evidence.  Without a record of which connectors have been queried and returned
+nothing, the engine had no way to apply a **negative penalty** to observables
+that multiple reputable connectors have explicitly found unremarkable.  An
+observable with zero enrichment hits was treated the same as an observable that
+had never been looked up — both received a neutral score rather than the
+negative-evidence-adjusted lower score that a "not seen by three connectors"
+result warrants.
+
+### Requirements
+
+1. Suppress redundant re-queries within a configurable time window (TTL).
+2. Expose the negative result to scoring pipelines as a typed, machine-readable
+   object.
+3. Require no new database tables or services.
+4. Survive process restarts (in-memory caches do not).
+
+---
+
+## Decision
+
+### `NegativeEvidenceRecord` Custom SDO
+
+A new custom STIX Domain Object is defined in
+`gnat/stix/sdos/negative_evidence.py`:
+
+```python
+@dataclass
+class NegativeEvidenceRecord(STIXBase):
+    """
+    x-gnat-negative-evidence — STIX custom SDO representing a confirmed
+    absence of data from a specific connector for a specific observable.
+
+    Stored via the workspace store like any other STIX object; no new
+    tables or services required.
+    """
+
+    type: str = "x-gnat-negative-evidence"
+    schema_version: int = 1
+
+    # The STIX ID of the observable that was queried
+    target_ref: str = ""
+
+    # The connector that performed the query and found nothing
+    queried_connector: str = ""
+
+    # Suppression window in seconds (default: 1 hour)
+    ttl_seconds: int = 3600
+
+    # UTC timestamp of the query that returned no results
+    query_timestamp: datetime | None = None
+```
+
+#### Key Methods
+
+```python
+def is_expired(self) -> bool:
+    """
+    Returns True if the TTL has elapsed since query_timestamp.
+    An expired record does NOT suppress re-querying; a fresh record does.
+    """
+    if self.query_timestamp is None:
+        return True
+    elapsed = (datetime.utcnow() - self.query_timestamp).total_seconds()
+    return elapsed > self.ttl_seconds
+
+def seconds_remaining(self) -> float:
+    """
+    Returns the number of seconds before this record expires.
+    Returns 0.0 if already expired.
+    """
+    if self.query_timestamp is None:
+        return 0.0
+    elapsed = (datetime.utcnow() - self.query_timestamp).total_seconds()
+    return max(0.0, self.ttl_seconds - elapsed)
+```
+
+### Write Path: Recording a Negative Result
+
+When an enrichment call returns an empty result set, the enrichment dispatcher
+calls `NegativeEvidenceStore.record_miss()`:
+
+```python
+class NegativeEvidenceStore:
+    """
+    Thin wrapper around WorkspaceStore for NegativeEvidenceRecord objects.
+    """
+
+    def record_miss(
+        self,
+        target_ref: str,
+        connector: str,
+        ctx: ExecutionContext,
+        ttl_seconds: int = 3600,
+    ) -> NegativeEvidenceRecord:
+        record = NegativeEvidenceRecord(
+            id=f"x-gnat-negative-evidence--{uuid4()}",
+            target_ref=target_ref,
+            queried_connector=connector,
+            ttl_seconds=ttl_seconds,
+            query_timestamp=datetime.utcnow(),
+        )
+        self._store.upsert(record, ctx)
+        return record
+
+    def get_fresh(
+        self,
+        target_ref: str,
+        connector: str,
+        workspace_id: str,
+    ) -> NegativeEvidenceRecord | None:
+        """
+        Returns an unexpired NegativeEvidenceRecord for the given
+        (target_ref, connector) pair, or None if no fresh record exists.
+        """
+        records = self._store.query(
+            type_filter="x-gnat-negative-evidence",
+            workspace_id=workspace_id,
+            filters={"target_ref": target_ref, "queried_connector": connector},
+        )
+        for record in records:
+            if not record.is_expired():
+                return record
+        return None
+```
+
+### Read Path: Suppressing Redundant Queries
+
+The enrichment dispatcher checks for a fresh negative record before calling
+each connector:
+
+```python
+# gnat/ingest/enrichment.py
+def _enrich_observable(
+    self,
+    observable: STIXBase,
+    connector: BaseClient,
+    ctx: ExecutionContext,
+) -> list[STIXBase]:
+    fresh_negative = self._neg_store.get_fresh(
+        target_ref=observable.id,
+        connector=type(connector).__module__.split(".")[-2],
+        workspace_id=ctx.workspace_id,
+    )
+    if fresh_negative:
+        logger.debug(
+            "Skipping %s for %s — negative evidence fresh for %.0fs",
+            type(connector).__name__,
+            observable.id,
+            fresh_negative.seconds_remaining(),
+        )
+        return []  # suppress API call
+
+    results = connector.enrich(observable, ctx)
+
+    if not results:
+        self._neg_store.record_miss(
+            target_ref=observable.id,
+            connector=type(connector).__module__.split(".")[-2],
+            ctx=ctx,
+            ttl_seconds=self._ttl_seconds,
+        )
+
+    return results
+```
+
+### Integration with `ReasoningEngine`
+
+`ReasoningEngine.prioritize()` (ADR-0044) reads fresh `NegativeEvidenceRecord`
+objects for each observable and applies a negative penalty to the composite
+score:
+
+```python
+# In ReasoningEngine._score_observable()
+fresh_negatives = self._neg_store.query_fresh_count(
+    target_ref=observable.id,
+    workspace_id=ctx.workspace_id,
+)
+neg_penalty = min(0.3 * fresh_negatives, 0.6)
+```
+
+**Negative penalty table:**
+
+| Fresh Negative Records | Penalty Applied |
+|------------------------|-----------------|
+| 0 | 0.0 |
+| 1 | 0.3 |
+| 2 | 0.6 (capped) |
+| 3+ | 0.6 (capped) |
+
+The cap at 0.6 ensures that even an observable with many negative hits retains
+a non-zero score in case a trust-weighted positive hit arrives later.
+
+### TTL Configuration
+
+TTL defaults to 3600 seconds (1 hour) but is configurable per deployment in
+the INI file:
+
+```ini
+[enrichment]
+negative_evidence_ttl = 3600    ; seconds; default 1 hour
+```
+
+Connectors that update more slowly (e.g. threat intelligence databases that
+publish weekly) may benefit from a longer TTL (e.g. 86400 seconds) configured
+at the connector level:
+
+```python
+class ShadowserverClient(BaseClient):
+    NEGATIVE_EVIDENCE_TTL: int = 86400  # 24 hours — weekly update cadence
+```
+
+`NegativeEvidenceStore.record_miss()` reads `NEGATIVE_EVIDENCE_TTL` from the
+connector class when present, falling back to the INI-configured default.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Quota preservation:** redundant queries are suppressed within the TTL
+  window, directly reducing API call volume.  In a deployment with 10,000
+  observables and 60% miss rate, suppression across a 1-hour window reduces
+  repeat calls from 30,000 to near-zero during replays and subsequent runs.
+- **Richer scoring:** the `ReasoningEngine` can now distinguish between
+  "unknown" and "confirmed not seen by N connectors," producing lower scores for
+  observables that multiple reputable connectors have explicitly found
+  unremarkable.
+- **Persistence across restarts:** `NegativeEvidenceRecord` is stored in the
+  workspace like any other STIX object; suppression survives process restarts,
+  unlike an in-memory cache.
+- **Zero new infrastructure:** no new tables, queues, message brokers, or
+  caching services are required.  The existing workspace store handles
+  persistence; the existing query interface handles retrieval.
+- **First-class STIX object:** negative evidence is exportable as part of a
+  STIX bundle, shareable between workspaces, and auditable via the lineage
+  tracker (ADR-0038).
+
+### Negative / Trade-offs
+
+- **Workspace store growth:** every enrichment miss creates a
+  `NegativeEvidenceRecord` object.  A deployment with 10,000 observables
+  queried against 5 connectors creates up to 50,000 records per TTL window.
+  A cleanup job (see Deferred) is needed to purge expired records.
+- **TTL is a blunt instrument:** a 1-hour TTL is appropriate for live threat
+  feeds but too short for weekly-updated databases and too long for real-time
+  feeds that update every minute.  The per-connector `NEGATIVE_EVIDENCE_TTL`
+  class variable partially addresses this, but it requires connector authors to
+  reason about update cadence.
+- **No invalidation on connector update:** if a connector's data is known to
+  have been refreshed (e.g. the operator manually triggers a full re-sync), the
+  TTL-based suppression cannot be invalidated without deleting all matching
+  `NegativeEvidenceRecord` objects.  Manual invalidation is not yet tooled.
+- **False negative suppression:** if a connector initially returns no results
+  but adds the indicator to its database within the TTL window, GNAT will not
+  re-query until the TTL expires, missing the new data.
+
+### Deferred
+
+- **Expired record cleanup job:** a scheduled `NegativeEvidencePurgeJob` that
+  deletes `NegativeEvidenceRecord` objects whose TTL has elapsed, preventing
+  unbounded workspace store growth.
+- **Per-observable TTL override:** allow analysts to set a shorter TTL on
+  high-priority observables that should be re-queried more aggressively.
+- **Manual invalidation API:** `gnat enrich invalidate-negative <stix_id>` CLI
+  command to force re-querying by deleting all matching negative records.
+- **Sharing across workspaces:** allow a negative evidence record in one
+  workspace to suppress queries in a sibling workspace, reducing redundant calls
+  in multi-tenant deployments.
+
+---
+
+## Alternatives Considered
+
+### In-memory LRU cache
+
+An in-process `functools.lru_cache` or `cachetools.TTLCache` keyed on
+`(observable_id, connector_name)` was the simplest implementation.  Rejected
+because:
+
+1. **Lost on restart:** a cache flush caused by a container restart or worker
+   crash would cause all missed queries to be re-issued, negating the quota
+   savings on the very occasions when pipelines are most likely to be
+   re-run (crash recovery).
+2. **Not shared across workers:** in a multi-worker deployment each worker
+   maintains an independent cache; a negative result learned by Worker A is not
+   known to Worker B.
+3. **Not auditable:** the `ReasoningEngine` cannot query an in-memory cache for
+   the negative penalty calculation without tight coupling between the scoring
+   engine and the enrichment dispatcher's runtime state.
+
+### Connector-side rate limiting
+
+Relying on each connector's own rate limiter to prevent redundant calls was
+considered.  Rejected because:
+
+1. Rate limiters enforce a maximum call *rate*, not a minimum interval between
+   identical calls.  A rate limiter allows 500 calls/minute but does not prevent
+   querying the same observable 500 times in a minute.
+2. Rate limiters are applied globally per connector, not per observable.  They
+   do not suppress re-querying a specific observable that already returned no
+   results.
+3. Rate limiters do not expose negative signal to the scoring pipeline.
+
+### Extending `EnrichmentLogModel`
+
+The existing `EnrichmentLogModel` (which records enrichment operations) could
+have been extended with a `result_count: int` column so that a query returning
+0 results is distinguishable from one not yet performed.
+
+Rejected because:
+
+1. `EnrichmentLogModel` is an append-only audit log, not a queryable state
+   store; answering "is there a fresh negative result for (X, connector)?"
+   would require a `MAX(timestamp)` query with a join, adding complexity.
+2. `EnrichmentLogModel` is not a STIX object and is therefore not shareable via
+   STIX bundles or exportable to partner workspaces.
+3. The existing lineage event model (ADR-0038) serves the audit function;
+   negative evidence requires a separate, queryable state representation.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md b/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md
new file mode 100644
index 00000000..ed5854c3
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0044-ADR-reasoning-engine.md
@@ -0,0 +1,464 @@
+# ADR-0044 — Evidence-Weighted Observable Reasoning Engine (Phase 4C)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT ingests thousands of STIX observables per pipeline run from dozens of
+connectors.  Prior to this ADR, analysts had no automated mechanism to answer
+the question: **"Given everything GNAT knows right now, which of these
+observables should I investigate first?"**
+
+The existing confidence scoring (ADR-0033) assigned a single confidence value
+per object based on connector-reported metadata.  This was insufficient for
+prioritisation because:
+
+1. **Single signal:** confidence came from one field on one object, ignoring the
+   object's age, corroborating hits across other objects in the workspace, and
+   negative evidence from connectors that had never seen the observable.
+2. **Trust-agnostic:** a 0.9-confidence hit from AlienVault OTX (open community
+   submissions) and a 0.9-confidence hit from the organisation's own Splunk
+   deployment were scored identically, despite the profound difference in source
+   authority.
+3. **Not explainable:** a single float score gave analysts no insight into why
+   an observable was scored high or low; it could not be audited.
+4. **Not persisted:** scores were computed on demand and discarded; there was no
+   record that prioritisation had occurred, breaking the lineage chain.
+
+The `HypothesisEngine` (ADR-0042) and `NegativeEvidenceRecord` (ADR-0043)
+introduced structured evidence objects that begged for a consumer: a scoring
+engine that reads them and produces a ranked, explainable prioritisation list.
+
+SOC analyst feedback collected during Phase 4B identified three signals as most
+valuable for triage prioritisation:
+
+- **Source authority** (whose data is this?)
+- **Recency** (how recently was this observed or updated?)
+- **Corroboration** (how many other data points mention this observable?)
+
+A fourth signal — **absence of data** — was identified as equally important:
+an observable not seen by any trusted connector is less urgent than one
+confirmed by three.
+
+---
+
+## Decision
+
+### `ReasoningEngine`
+
+The scoring engine is defined in `gnat/reasoning/engine.py`:
+
+```python
+class ReasoningEngine:
+    """
+    Prioritises a set of STIX observables using a composite evidence-weighted
+    score derived from trust level, age, Solr corroboration, and negative
+    evidence penalties.
+
+    Parameters
+    ----------
+    store : WorkspaceStore
+        Workspace store used to persist STIX note objects when store_notes=True.
+    search_index : SearchIndex
+        Solr search index for corroboration queries.  Pass NullSearchIndex when
+        Solr is unavailable; the engine degrades gracefully.
+    neg_store : NegativeEvidenceStore
+        Store for fresh NegativeEvidenceRecord lookups.
+    trust_weights : dict[str, float] | None
+        Override for the default TRUST_WEIGHTS mapping.  Pass None to use
+        the shared constant from gnat.core.trust.
+    """
+
+    def __init__(
+        self,
+        store: WorkspaceStore,
+        search_index: SearchIndex,
+        neg_store: NegativeEvidenceStore,
+        trust_weights: dict[str, float] | None = None,
+    ) -> None:
+        self._store = store
+        self._search = search_index
+        self._neg = neg_store
+        self._weights = trust_weights or TRUST_WEIGHTS
+```
+
+### `prioritize()`
+
+The primary public method:
+
+```python
+def prioritize(
+    self,
+    observable_set: list[STIXBase],
+    ctx: ExecutionContext,
+    store_notes: bool = True,
+) -> list[tuple[STIXBase, float, dict]]:
+    """
+    Score and rank a list of STIX observables.
+
+    Parameters
+    ----------
+    observable_set : list[STIXBase]
+        The observables to score.  All must belong to ctx.workspace_id.
+    ctx : ExecutionContext
+        Execution context; trust_level and workspace_id are read from here.
+    store_notes : bool
+        When True, persist a STIX note object for each scored observable
+        recording the score breakdown.  Defaults to True.
+
+    Returns
+    -------
+    list[tuple[STIXBase, float, dict]]
+        Triples of (observable, score, explanation), sorted by score descending.
+        score is in [0.0, 1.0].  explanation is a machine-readable dict.
+    """
+    results = []
+    for obs in observable_set:
+        score, explanation = self._score_observable(obs, ctx)
+        if store_notes:
+            self._persist_note(obs, score, explanation, ctx)
+        results.append((obs, score, explanation))
+
+    results.sort(key=lambda t: t[1], reverse=True)
+    return results
+```
+
+### Composite Scoring Formula
+
+```
+score = trust_weight × 0.4
+      + age_factor   × 0.3
+      + corroboration_bonus × 0.3
+      − neg_penalty  × 0.5
+```
+
+The result is clamped to `[0.0, 1.0]`.
+
+#### Component Definitions
+
+**`trust_weight`** — derived from `ExecutionContext.trust_level`:
+
+| Trust Level | trust_weight |
+|-------------|-------------|
+| `trusted_internal` | 0.9 |
+| `semi_trusted` | 0.6 |
+| `untrusted_external` | 0.3 |
+
+The context trust level represents the highest-authority source in the pipeline
+that produced or enriched this observable.
+
+**`age_factor`** — time-decay from the observable's `modified` field:
+
+```python
+def _age_factor(self, obs: STIXBase) -> float:
+    if obs.modified is None:
+        return 0.5  # no timestamp: neutral decay
+    days_old = (datetime.utcnow() - obs.modified).total_seconds() / 86400.0
+    return max(0.0, 1.0 - 0.05 * days_old)
+```
+
+| Age (days) | age_factor |
+|-----------|-----------|
+| 0 (today) | 1.00 |
+| 1 | 0.95 |
+| 5 | 0.75 |
+| 10 | 0.50 |
+| 20 | 0.00 (floor) |
+
+**`corroboration_bonus`** — Solr hit count for the observable's identifier fields:
+
+```python
+def _corroboration_bonus(self, obs: STIXBase) -> float:
+    hits = self._search.query(
+        obs.name or obs.id,
+        fields=["name", "pattern", "value", "description"],
+    )
+    return min(len(hits) * 0.05, 0.25)
+```
+
+| Solr Hits | corroboration_bonus |
+|-----------|-------------------|
+| 0 | 0.00 |
+| 1 | 0.05 |
+| 3 | 0.15 |
+| 5+ | 0.25 (cap) |
+
+**`neg_penalty`** — count of unexpired `NegativeEvidenceRecord` objects for
+this observable:
+
+```python
+def _neg_penalty(self, obs: STIXBase, workspace_id: str) -> float:
+    count = self._neg.query_fresh_count(
+        target_ref=obs.id,
+        workspace_id=workspace_id,
+    )
+    return min(0.3 * count, 0.6)
+```
+
+| Fresh Negative Records | neg_penalty |
+|------------------------|------------|
+| 0 | 0.00 |
+| 1 | 0.30 |
+| 2+ | 0.60 (cap) |
+
+The cap at 0.60 applied via the `× 0.5` formula coefficient means the maximum
+negative penalty subtracted from the composite score is `0.60 × 0.5 = 0.30`,
+preserving a floor above zero even for heavily negatively-evidenced observables.
+
+### Full Scoring Implementation
+
+```python
+def _score_observable(
+    self,
+    obs: STIXBase,
+    ctx: ExecutionContext,
+) -> tuple[float, dict]:
+    tw = self._weights.get(ctx.trust_level, 0.6)
+    af = self._age_factor(obs)
+    cb = self._corroboration_bonus(obs)
+    np_ = self._neg_penalty(obs, ctx.workspace_id)
+
+    raw = tw * 0.4 + af * 0.3 + cb * 0.3 - np_ * 0.5
+    score = round(max(0.0, min(1.0, raw)), 4)
+
+    explanation = {
+        "score": score,
+        "components": {
+            "trust_weight":          tw,
+            "trust_weight_coeff":    0.4,
+            "age_factor":            af,
+            "age_factor_coeff":      0.3,
+            "corroboration_bonus":   cb,
+            "corroboration_coeff":   0.3,
+            "neg_penalty":           np_,
+            "neg_penalty_coeff":     0.5,
+        },
+        "trust_level":   ctx.trust_level,
+        "workspace_id":  ctx.workspace_id,
+        "evaluated_at":  datetime.utcnow().isoformat(),
+    }
+    return score, explanation
+```
+
+### Explanation Dict Structure
+
+The `explanation` dict is machine-readable, not free text, so that downstream
+components (report generators, SOAR connectors, TUI) can format it as needed:
+
+```json
+{
+  "score": 0.6250,
+  "components": {
+    "trust_weight":        0.9,
+    "trust_weight_coeff":  0.4,
+    "age_factor":          0.75,
+    "age_factor_coeff":    0.3,
+    "corroboration_bonus": 0.15,
+    "corroboration_coeff": 0.3,
+    "neg_penalty":         0.0,
+    "neg_penalty_coeff":   0.5
+  },
+  "trust_level":  "trusted_internal",
+  "workspace_id": "acme-corp",
+  "evaluated_at": "2026-04-09T14:23:01.000Z"
+}
+```
+
+### STIX Note Persistence
+
+When `store_notes=True`, the engine persists a STIX 2.1 `note` object for each
+scored observable:
+
+```python
+def _persist_note(
+    self,
+    obs: STIXBase,
+    score: float,
+    explanation: dict,
+    ctx: ExecutionContext,
+) -> None:
+    note = STIXNote(
+        id=f"note--{uuid4()}",
+        abstract=f"ReasoningEngine score: {score:.4f}",
+        content=json.dumps(explanation, indent=2),
+        object_refs=[obs.id],
+        created_by_ref=ctx.initiated_by,
+    )
+    self._store.upsert(note, ctx)
+```
+
+STIX `note` objects link to their target via `object_refs`, making the
+score and explanation auditable via the standard STIX relationship graph
+and exportable in STIX bundles.
+
+### Solr Degradation
+
+When Solr is unavailable, `NullSearchIndex` is substituted:
+
+```python
+class NullSearchIndex(SearchIndex):
+    """No-op search index used when Solr is unavailable."""
+
+    def query(self, query: str, fields: list[str] | None = None) -> list[dict]:
+        return []
+```
+
+With `NullSearchIndex`, `corroboration_bonus` is always 0.0.  The engine
+continues to score using `trust_weight`, `age_factor`, and `neg_penalty`,
+producing a degraded but still useful ranking.
+
+### Usage Example
+
+```python
+from gnat.reasoning.engine import ReasoningEngine
+from gnat.search import GNATIndexer
+from gnat.core.context import ExecutionContext
+
+ctx = ExecutionContext.from_connector(
+    connector=splunk_client,
+    domain="analysis",
+    workspace_id="acme-corp",
+)
+
+engine = ReasoningEngine(
+    store=workspace_store,
+    search_index=GNATIndexer.from_config(config),
+    neg_store=neg_evidence_store,
+)
+
+ranked = engine.prioritize(
+    observable_set=all_indicators,
+    ctx=ctx,
+    store_notes=True,
+)
+
+for obs, score, explanation in ranked[:10]:
+    print(f"{score:.4f}  {obs.name or obs.id}")
+    # > 0.7800  192.0.2.1
+    # > 0.6550  evil-domain.example.com
+    # > 0.4200  suspicious-hash-abc123
+```
+
+---
+
+## Consequences
+
+### Positive
+
+- **Deterministic and reproducible:** given the same inputs (trust level, object
+  timestamps, Solr hit counts, negative records), the formula always produces
+  the same score.  This makes it testable with fixed fixtures and auditable
+  after the fact.
+- **Explainable:** the structured `explanation` dict exposes every scoring
+  component; analysts can see exactly why an observable ranked high or low
+  without reading source code.
+- **Fully auditable:** STIX `note` objects link scores to observables in the
+  standard STIX graph; the entire prioritisation history is queryable and
+  exportable.
+- **Solr-optional:** `NullSearchIndex` allows the engine to operate in minimal
+  deployments (developer workstations, CI) without a Solr sidecar, with only
+  the corroboration component degraded.
+- **Composable:** the scoring formula uses components already computed by
+  `NegativeEvidenceStore` and `ExecutionContext`; no new data collection is
+  needed beyond what Phase 4C already produces.
+- **No new dependencies:** all components are pure Python dataclass operations
+  plus existing Solr and SQLAlchemy infrastructure; no new packages are required.
+
+### Negative / Trade-offs
+
+- **Context trust level is pipeline-level:** `trust_weight` is read from the
+  `ExecutionContext`, which represents the trust of the pipeline that ingested
+  the observable, not the trust of each individual source that contributed to
+  the enrichment.  An observable enriched by both Splunk (trusted_internal) and
+  AlienVault (untrusted_external) in different pipeline runs will be scored
+  differently depending on which pipeline context `prioritize()` is called with.
+  Per-observable trust aggregation is deferred.
+- **Age factor assumes `modified` is reliable:** not all connectors reliably
+  populate the STIX `modified` field; objects with no `modified` receive the
+  neutral 0.5 factor, which may over- or under-rank them depending on their
+  actual age.
+- **Corroboration bonus is hit-count-based:** the Solr query returns a count of
+  matching documents, not a measure of the quality or relevance of those
+  matches.  A high Solr hit count on a generic observable (e.g. a popular CDN
+  IP) may inflate the bonus.
+- **Score storage growth:** with `store_notes=True`, every call to `prioritize()`
+  on N observables creates N STIX note objects.  Regular re-prioritisation
+  (e.g. on a daily schedule) accumulates many notes per observable.  A retention
+  policy is needed.
+
+### Deferred
+
+- **Per-observable trust aggregation:** compute the effective trust weight from
+  all connectors that have enriched the observable (max, weighted average, or
+  union) rather than from the pipeline-level `ExecutionContext`.
+- **ML-based weight calibration:** collect analyst feedback on scored results
+  (accepted/rejected triage decisions) and use them to calibrate the formula
+  coefficients (`0.4`, `0.3`, `0.3`, `0.5`) via a regression model.
+- **Score note retention policy:** a `ScoreNotePurgeJob` that deletes note
+  objects older than a configurable threshold, retaining only the most recent
+  score per observable.
+- **TUI prioritisation dashboard:** display the ranked observable list with
+  expandable `explanation` views in the Textual TUI.
+- **Streaming prioritisation:** emit score updates as new evidence arrives via
+  the HookBus rather than requiring explicit `prioritize()` calls.
+
+---
+
+## Alternatives Considered
+
+### ML-based ranking (deferred, not rejected)
+
+A supervised ranking model trained on analyst triage decisions was the
+originally proposed approach.  It was deferred (not rejected) because:
+
+1. GNAT does not yet have labelled training data (analyst accept/reject
+   decisions on scored observables); the formula-based engine will collect this
+   data in production.
+2. An ML model is harder to explain and audit; the formula produces an
+   `explanation` dict that every component of the system can parse.
+3. ML models require a training pipeline, model versioning, and serving
+   infrastructure that are out of scope for Phase 4C.
+
+The formula-based engine is explicitly designed to be replaceable: the scoring
+logic is isolated in `_score_observable()`, and the coefficients are named
+constants that a future calibration layer can tune without changing the public
+API.
+
+### Flat confidence score only
+
+Retaining the Phase 3 single-field confidence score and not introducing a
+multi-component formula was the minimal alternative.  Rejected because:
+
+1. It ignores trust authority (source reliability) — the single most important
+   factor identified in analyst feedback.
+2. It ignores recency — a 1-year-old hit is less actionable than a hit from
+   today.
+3. It has no mechanism to penalise observables that multiple connectors have
+   already examined and found unremarkable.
+4. It is not explainable — analysts cannot determine why an observable ranked
+   above another.
+
+### Graph-centrality ranking
+
+Using the STIX relationship graph to compute centrality scores (e.g. PageRank
+over the STIX `relationship` graph) as the primary ranking signal was
+considered.  Rejected because:
+
+1. GNAT workspaces in early deployments may have sparse relationship graphs;
+   centrality degrades to random ranking for isolated observables.
+2. Graph traversal over potentially 100,000+ STIX objects requires significant
+   compute and is not suitable for on-demand scoring within a pipeline run.
+3. Centrality does not incorporate trust authority, recency, or negative
+   evidence without substantial additional engineering.
+
+Graph-based ranking remains a viable long-term complement to the formula and
+may be reintroduced as an optional corroboration signal once workspaces have
+sufficient relationship density.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md b/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md
new file mode 100644
index 00000000..e4d367cd
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0045-ADR-agent-governance.md
@@ -0,0 +1,277 @@
+# ADR-0045 — Agent Governance Layer (Phase 4D)
+
+**Date:** 2026-04-09
+**Status:** Accepted
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT's AI agent layer (`gnat/agents/`) had grown substantially through Phases 3 and 4 to include
+`ResearchAgent`, `ParsingAgent`, `CopilotReader`, and a family of workflow and quality agents.
+Each of these agents can invoke connector actions — fetching threat intelligence, enriching
+indicators, exporting STIX bundles, and triggering SOAR playbooks.
+
+As agents gained write access, two serious gaps emerged:
+
+1. **No permission system.** Any agent could call any connector action regardless of its origin or
+   the sensitivity of the target workspace. A `ParsingAgent` used in an untrusted enrichment
+   pipeline had the same effective privileges as an internally authored `ResearchAgent`.
+
+2. **No audit trail.** Agent-originated writes were indistinguishable in the enrichment log from
+   direct analyst operations. When an indicator was modified by an agent, there was no record of
+   which agent did it, under what context, or whether any human had authorised the change.
+
+The absence of a governance layer made agent deployments unsuitable for production environments
+with compliance requirements (SOC 2, ISO 27001, MSSPs serving regulated verticals). Operators
+had no mechanism to restrict, monitor, or rate-limit agent activity.
+
+---
+
+## Decision
+
+Introduce an **`AgentGovernor`** as the authoritative policy enforcement point for all agent
+actions in GNAT. Every agent action must pass through the governor before it may execute.
+
+### `AgentActionType` Enum
+
+Ten action types covering the full range of agent-reachable operations:
+
+| Action Type | Description |
+|---|---|
+| `read_stix` | Read STIX objects from a connector or workspace |
+| `write_stix` | Create or update STIX objects |
+| `delete_stix` | Soft-delete STIX objects |
+| `enrich` | Call enrichment dispatcher against existing objects |
+| `ingest` | Run an ingest pipeline or reader |
+| `export` | Trigger an export (EDL, STIX bundle, Netskope CE) |
+| `trigger_playbook` | Invoke an XSOAR or external SOAR playbook |
+| `manage_workspace` | Create, rename, or delete a workspace |
+| `escalate` | Route a finding to the review queue or analyst channel |
+| `hypothesize` | Generate AI hypotheses (read-only, no state mutation) |
+
+### Trust Levels
+
+Three trust levels applied to every agent at registration time:
+
+| Trust Level | Description |
+|---|---|
+| `trusted_internal` | Internally authored agents, admin-signed, registry-registered |
+| `semi_trusted` | Third-party or plugin agents loaded at runtime |
+| `untrusted_external` | Externally supplied agents (research pipeline agents, unverified) |
+
+### Default Permission Matrix
+
+```
+                        trusted_internal  semi_trusted  untrusted_external
+read_stix               ✓                 ✓             ✓
+write_stix              ✓                 ✓             ✗
+delete_stix             ✓                 ✗             ✗
+enrich                  ✓                 ✓             ✓
+ingest                  ✓                 ✓             ✗
+export                  ✓                 ✗             ✗
+trigger_playbook        ✓                 ✗             ✗
+manage_workspace        ✓                 ✗             ✗
+escalate                ✓                 ✓             ✓
+hypothesize             ✓                 ✓             ✓
+```
+
+### `AgentAction` Dataclass
+
+Immutable record created for every checked action, whether approved or denied:
+
+```python
+@dataclass
+class AgentAction:
+    action_id: str          # UUID4
+    agent_id: str           # registered agent identifier
+    action_type: AgentActionType
+    target_ref: str         # STIX ID or connector name of the target
+    impact_level: str       # "low" | "medium" | "high" | "critical"
+    session_id: str         # owning agent session UUID
+    context_id: str | None  # workspace or execution context name
+    result_json: str        # JSON-encoded outcome or error
+    approved_by: str | None # reviewer ID for HITL-approved actions
+    submitted_at: datetime
+    executed_at: datetime | None
+    status: str             # "pending" | "approved" | "denied" | "executed" | "failed"
+```
+
+### `AgentGovernor` API
+
+Located at `gnat/agents/governor.py`:
+
+```python
+from gnat.agents.governor import AgentGovernor, AgentActionType
+
+governor = AgentGovernor()
+
+# Check permission — returns True/False
+governor.can_act(
+    agent_id="research-agent-v2",
+    action_type=AgentActionType.write_stix,
+    trust_level="semi_trusted",
+)
+
+# Assert permission — raises AgentPermissionDenied if denied
+governor.require_can_act(
+    agent_id="research-agent-v2",
+    action_type=AgentActionType.export,
+    trust_level="semi_trusted",
+)
+
+# Record a completed action
+governor.record_action(action)
+
+# Sliding-window rate limit — raises RateLimitExceeded on breach
+governor.rate_limit_check(
+    agent_id="research-agent-v2",
+    window_seconds=3600,  # configurable per agent
+)
+
+# Query audit log
+log = governor.get_action_log(agent_id="research-agent-v2")
+all_actions = governor.get_action_log()  # all agents
+
+# Runtime policy override — persists for the process lifetime
+governor.set_policy_override(
+    agent_id="custom-agent",
+    action_type=AgentActionType.export,
+    allowed=True,
+)
+```
+
+### Exceptions
+
+```python
+from gnat.agents.governor import AgentPermissionDenied, RateLimitExceeded
+
+# AgentPermissionDenied(agent_id, action_type, trust_level, reason)
+# RateLimitExceeded(agent_id, window_seconds, call_count, limit)
+```
+
+Both inherit from `GNATClientError` so they are caught by the standard error handling path.
+
+### HookBus Integration
+
+`record_action()` emits a `"agent_action_recorded"` event on the global `HookBus` after
+persisting to the in-memory audit log. Operators can subscribe to receive real-time action
+events for external SIEM forwarding:
+
+```python
+from gnat.agents.governor import AgentGovernor
+from gnat.context import HookBus
+
+bus = HookBus.get_default()
+bus.subscribe("agent_action_recorded", lambda evt: siem_client.send(evt))
+```
+
+### Database Schema
+
+Two new tables added via Alembic migration `0006_add_agent_governance.py`:
+
+**`agent_sessions`**
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | `VARCHAR(36)` | UUID4 primary key |
+| `agent_id` | `VARCHAR(200)` | registered agent identifier |
+| `trust_level` | `VARCHAR(50)` | one of the three trust levels |
+| `context_id` | `VARCHAR(200)` | workspace or execution context |
+| `started_at` | `DATETIME` | UTC |
+| `ended_at` | `DATETIME` | nullable |
+| `action_count` | `INTEGER` | incremented on each `record_action()` |
+| `policy_overrides_json` | `TEXT` | JSON map of per-agent overrides active at session start |
+
+**`agent_actions`**
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | `VARCHAR(36)` | UUID4 primary key |
+| `session_id` | `VARCHAR(36)` | FK → `agent_sessions.id` |
+| `agent_id` | `VARCHAR(200)` | denormalised for query convenience |
+| `action_type` | `VARCHAR(50)` | enum value |
+| `target_ref` | `VARCHAR(500)` | STIX ID or connector name |
+| `impact_level` | `VARCHAR(20)` | `low` / `medium` / `high` / `critical` |
+| `status` | `VARCHAR(20)` | lifecycle status |
+| `approved_by` | `VARCHAR(200)` | nullable |
+| `result_json` | `TEXT` | outcome payload |
+| `submitted_at` | `DATETIME` | UTC |
+| `executed_at` | `DATETIME` | nullable |
+
+Composite index on `(agent_id, submitted_at)` for time-range queries on a single agent.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Least-privilege enforcement:** agents that do not need write access cannot obtain it
+  regardless of the code paths they call; the permission matrix is the single source of truth.
+- **Immutable audit trail:** every agent action — approved or denied — is recorded with full
+  context, making compliance evidence generation straightforward.
+- **Rate limiting prevents runaway agents:** a misconfigured `ResearchAgent` with
+  `max_calls_per_run=9999` will be stopped by the sliding-window counter before it exhausts
+  API quota on a connected platform.
+- **Per-deployment customisation:** `set_policy_override()` lets operators grant or restrict
+  individual agents at runtime without a code change — important for MSP deployments where
+  customer-specific agents need tailored permissions.
+- **HookBus integration enables SIEM forwarding** at zero additional cost to the caller.
+
+### Negative / Trade-offs
+
+- **Slight performance overhead:** every agent action incurs a permission check and an audit
+  log write. For high-frequency ingest agents this adds a small but measurable latency.
+- **In-memory rate limit counter:** the sliding-window counter resets on process restart.
+  Distributed deployments where multiple GNAT workers serve the same agent pool should
+  configure an external Redis counter (deferred, see below).
+- **Policy matrix is static at import time:** the default permission matrix is a module-level
+  dict; runtime overrides apply only to the running process. Multi-process deployments must
+  configure overrides identically on each worker or use the shared DB override table.
+
+### Deferred
+
+- Distributed rate limiting via Redis sidecar
+- Per-action approval workflow (short-circuited in Phase 4D by `HITLGateway` — see ADR-0046)
+- Agent registry with cryptographic signing of agent identity
+- Capability-based security tokens as an alternative to trust-level categories
+
+---
+
+## Alternatives Considered
+
+### Capability-Based Security Tokens
+
+Each agent would hold a signed token listing specific capabilities (analogous to OAuth2 scopes).
+Token validation would replace the trust-level lookup. This model is more granular and suitable
+for multi-organisation federation, but is significantly more complex to implement and operate —
+particularly for the embedded agents that run inside the same process as the pipeline. It was
+deferred as a future evolution once agent federation becomes a firm requirement.
+
+### OAuth2 Scopes Per Agent
+
+Define a fixed set of OAuth2 scopes (`gnat:read`, `gnat:write`, `gnat:export`, etc.) and issue
+per-agent tokens from a lightweight authorization server. Rejected because it introduces an
+external service dependency for what is currently a single-process feature. The scope model will
+be revisited if GNAT ever exposes its agent layer over a network boundary.
+
+### Audit Logging Only (No Permission Enforcement)
+
+Log all agent actions but do not block anything. Rejected because post-hoc detection of
+unauthorised agent writes is insufficient for regulated environments — damage may occur before
+the audit log is reviewed. The prevention-first model of `require_can_act()` is the correct
+default; audit logging is the secondary safeguard.
+
+### Connector-Level Guards Only
+
+Apply permission checks at the connector's `upsert_object()` / `delete_object()` entry points
+rather than in a centralised governor. Rejected because it requires every connector
+implementation to carry governance logic, creates inconsistent enforcement across the 99
+connectors, and cannot easily support cross-cutting policies such as rate limiting and HookBus
+emission.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md b/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md
new file mode 100644
index 00000000..7ba5d07a
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0046-ADR-hitl-gateway.md
@@ -0,0 +1,298 @@
+# ADR-0046 — Human-in-the-Loop Gateway (Phase 4D)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT's AI agents can now be granted write, export, playbook-trigger, and
+workspace-management permissions via `AgentGovernor` (ADR-0045).  For most
+trust levels and action types the governor's permission matrix is sufficient:
+either the action is allowed and it executes immediately, or it is denied
+outright.
+
+However, a subset of agent actions are high-impact enough that neither
+automatic approval nor outright denial is the correct policy:
+
+- Triggering an XSOAR playbook against a live environment carries irreversible
+  side effects (firewall rule changes, endpoint isolation, ticket creation).
+- Workspace deletions or bulk STIX deletions are difficult to roll back.
+- Escalation decisions that route findings to an incident team should carry an
+  auditable human sign-off.
+
+Prior to this ADR there was no mechanism to **pause** an agent action and hold
+it in a review queue until a human operator approved or rejected it.  The
+existing `gnat/review/` module contained a fully implemented `ReviewService`
+and `ReviewQueueStore`, but they were reachable only from the report lifecycle
+(ADR-0034); agents had no bridge to that infrastructure.
+
+The result was an all-or-nothing choice: either grant agents unrestricted
+write access, or block the action class entirely.  Neither option is suitable
+for production deployments where agents need occasional high-impact capability
+under controlled conditions.
+
+---
+
+## Decision
+
+Introduce **`HITLGateway`** (`gnat/agents/hitl.py`) as a thin policy bridge
+between `AgentGovernor` and `gnat/review/service.py`.  Every agent action
+evaluated by `AgentGovernor.require_can_act()` is additionally evaluated by
+`HITLGateway.evaluate()` before it may execute.
+
+### Impact Tier Classification
+
+Impact level is a field on `AgentAction` (see ADR-0045) set by the agent at
+action creation time.  `HITLGateway` routes on that field:
+
+| Impact Level | Routing Policy | Review Queue Entry |
+|---|---|---|
+| `low` | Auto-approved, execution proceeds immediately | None (logged only) |
+| `medium` | Auto-approved, execution proceeds immediately | None (logged only) |
+| `high` | Blocked pending human approval via `ReviewService` | `PENDING` `ReviewItem` created |
+| `critical` | Blocked pending human approval; XSOAR playbook notification sent | `PENDING` `ReviewItem` created + XSOAR alert |
+
+### `HITLGateway` API
+
+Located at `gnat/agents/hitl.py`:
+
+```python
+from gnat.agents.hitl import HITLGateway
+from gnat.agents.governor import AgentAction, AgentActionType
+
+gateway = HITLGateway()
+
+# Primary entry point — called by AgentGovernor after permission check passes
+approved, review_item = gateway.evaluate(action)
+if not approved:
+    # action is PENDING; agent should poll or await human decision
+    print(f"Action {action.action_id} awaiting review: {review_item.id}")
+
+# Submit a specific action to the review queue explicitly
+review_item = gateway.submit_for_approval(action)
+
+# Poll queue for a decision
+from gnat.review.service import ReviewStatus
+status = gateway.check_approval_status(review_item.id)
+# status is one of ReviewStatus.PENDING, APPROVED, REJECTED
+
+# Auto-approve (used in test harnesses and auto-escalation policies)
+gateway.auto_approve_pending(review_item.id, reviewer="auto-policy")
+```
+
+### `evaluate()` Logic
+
+```python
+def evaluate(
+    self, action: AgentAction
+) -> tuple[bool, ReviewItem | None]:
+    if action.impact_level in ("low", "medium"):
+        self._log_auto_approved(action)
+        return True, None
+
+    review_item = self.submit_for_approval(action)
+
+    if action.impact_level == "critical":
+        self._notify_xsoar(action, review_item)
+
+    return False, review_item
+```
+
+The action is **blocked** (returns `False`) for `high` and `critical` levels
+regardless of the trust level of the agent.  Even a `trusted_internal` agent
+must pause for a human reviewer if its action carries `impact_level="critical"`.
+
+### `submit_for_approval()` — ReviewService Bridge
+
+`submit_for_approval()` converts the `AgentAction` dataclass into a
+STIX-compatible metadata dict and delegates to `ReviewService.submit()`:
+
+```python
+def submit_for_approval(self, action: AgentAction) -> ReviewItem:
+    payload = {
+        "type": "agent-action-review",
+        "action_id": action.action_id,
+        "agent_id": action.agent_id,
+        "action_type": action.action_type.value,
+        "target_ref": action.target_ref,
+        "impact_level": action.impact_level,
+        "context_id": action.context_id,
+        "submitted_at": action.submitted_at.isoformat(),
+    }
+    return self._review_service.submit(
+        item_type="agent_action",
+        payload=payload,
+        submitter=action.agent_id,
+        priority="high" if action.impact_level == "critical" else "normal",
+    )
+```
+
+No new storage is introduced — `ReviewItem` and `ReviewQueueStore` from
+`gnat/review/` are used as-is.
+
+### Approval Timeout
+
+`check_approval_status()` enforces a configurable timeout:
+
+```python
+def check_approval_status(self, review_id: str) -> ReviewStatus:
+    item = self._review_service.get(review_id)
+    elapsed = (datetime.utcnow() - item.submitted_at).total_seconds()
+    if (
+        item.status == ReviewStatus.PENDING
+        and elapsed > self._approval_timeout_seconds
+    ):
+        self._review_service.reject(
+            review_id,
+            reason="auto-rejected: approval timeout exceeded",
+            reviewer="hitl-gateway",
+        )
+        return ReviewStatus.REJECTED
+    return item.status
+```
+
+Default `approval_timeout_seconds` is `3600` (one hour).  Configurable via the
+`[agents]` INI section:
+
+```ini
+[agents]
+hitl_approval_timeout_seconds = 3600
+hitl_xsoar_playbook_id = P-GNAT-AGENT-ALERT
+```
+
+### XSOAR Notification for Critical Actions
+
+For `critical` impact actions, `HITLGateway` calls the XSOAR connector's
+`upsert_object()` with a pre-formed STIX `incident` custom object:
+
+```python
+def _notify_xsoar(
+    self, action: AgentAction, review_item: ReviewItem
+) -> None:
+    incident = {
+        "type": "x-gnat-incident",
+        "name": f"HITL Review Required: {action.action_type.value}",
+        "severity": "high",
+        "agent_id": action.agent_id,
+        "action_id": action.action_id,
+        "review_id": review_item.id,
+        "target_ref": action.target_ref,
+    }
+    try:
+        self._xsoar_client.upsert_object(incident)
+    except Exception as exc:
+        # Notification failure must never block the review queue entry
+        logger.warning("XSOAR notification failed: %s", exc)
+```
+
+The XSOAR client is a `trusted_internal` connector instance constructed from
+the INI `[xsoar]` section.  If XSOAR is not configured, the notification is
+skipped and a warning is logged; the `ReviewItem` is still created.
+
+### Sequence Diagram
+
+```
+Agent                  AgentGovernor          HITLGateway          ReviewService
+  |                         |                      |                     |
+  |── require_can_act() ──► |                      |                     |
+  |                         |── evaluate(action) ► |                     |
+  |                         |                      |── submit() ────────►|
+  |                         |                      |◄── ReviewItem ──────|
+  |                         |                      |                     |
+  |                         |  [critical only]      |                     |
+  |                         |                      |── _notify_xsoar()   |
+  |                         |                      |   (XSOARClient)     |
+  |                         |◄── (False, item) ────|                     |
+  |◄── AgentActionPending ──|                      |                     |
+  |                         |                      |                     |
+  |   [human approves]      |                      |                     |
+  |── check_approval() ─────────────────────────► |── get(review_id) ──►|
+  |◄── APPROVED ─────────────────────────────────── |◄── ReviewStatus ───|
+```
+
+---
+
+## Consequences
+
+### Positive
+
+- **No runaway high-impact actions:** agents cannot execute playbook triggers,
+  workspace deletions, or bulk STIX writes without a human in the loop,
+  regardless of their trust level.
+- **Zero new storage infrastructure:** the review queue already existed in
+  `gnat/review/`.  `HITLGateway` is a pure orchestration layer with no new
+  tables or persistence concerns.
+- **XSOAR users receive actionable alerts:** operators who rely on XSOAR as
+  their SOAR console see critical agent actions appear as incidents immediately,
+  without requiring a separate notification integration.
+- **Timeout prevents indefinite blocking:** auto-rejection after one hour
+  ensures that a missed review does not permanently block an agent session.
+- **Testable in isolation:** `HITLGateway` accepts a `review_service` and
+  `xsoar_client` in its constructor, enabling full injection of test doubles.
+
+### Negative / Trade-offs
+
+- **Agents must poll or wait for approval:** there is no push-based callback
+  mechanism.  Agents that need a fast response for `high`-impact actions must
+  implement a polling loop or be designed to suspend and resume.
+- **Timeout is process-local:** the timeout check runs inside
+  `check_approval_status()`, which the agent must call.  If the agent process
+  restarts, in-flight pending reviews are not automatically expired; a
+  background sweep task is needed for production deployments (deferred).
+- **Single XSOAR integration point:** critical notifications only reach XSOAR
+  in this implementation.  Other SOAR platforms (Splunk SOAR, Palo Alto XSIAM)
+  require additional notification adapters (deferred).
+
+### Deferred
+
+- Background sweep task to expire timed-out `PENDING` reviews independently of
+  agent polling
+- Multi-SOAR notification adapters (Splunk SOAR, Tines, Torq)
+- Webhook-based push approval for non-XSOAR environments (e.g. Slack approval
+  buttons via the Discord/Slack connectors)
+- Role-based approval routing: routing `critical` actions to a named reviewer
+  group rather than the global queue
+
+---
+
+## Alternatives Considered
+
+### Rebuild a Dedicated HITL Queue
+
+A purpose-built queue store separate from `gnat/review/` was considered to
+avoid coupling agent governance to the report review subsystem.  Rejected
+because `ReviewService` and `ReviewQueueStore` already implement exactly the
+required semantics (item submission, status polling, approval/rejection,
+timeout), and duplication would create two review mechanisms that diverge over
+time.  The bridge pattern costs fewer than 120 lines of code.
+
+### Email-Only Notification
+
+Sending an email to a configured address for `high` and `critical` actions was
+prototyped.  Rejected because email provides no structured approval path: the
+reviewer has no UI from which to approve or reject the action back into the
+system.  Notifications via XSOAR (and future adapters) provide a structured
+approval workflow.
+
+### Synchronous Approval via Long-Poll
+
+Blocking the agent's calling thread in a long-poll loop until the review is
+resolved was considered.  Rejected because it ties up a thread for the full
+approval window (up to one hour by default) and makes the system unresponsive
+to cancellation.  The asynchronous poll-or-suspend model is more appropriate
+for an embedded agent runtime.
+
+### Trust-Level Exemption for `trusted_internal`
+
+A proposal to exempt `trusted_internal` agents from HITL checks for `high`
+impact actions was considered.  Rejected on security grounds: trust level
+reflects the provenance of the agent code, not the risk of the target action.
+Even a fully trusted agent should not autonomously trigger a production SOAR
+playbook without a human sign-off.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md b/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md
new file mode 100644
index 00000000..59450819
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0047-ADR-workspace-isolation.md
@@ -0,0 +1,320 @@
+# ADR-0047 — Workspace Trust Boundary Enforcement (Phase 4E)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT workspaces are the primary isolation unit for multi-tenant and
+multi-classification deployments.  Each workspace holds a set of STIX objects,
+an enrichment log, and configuration for the connectors that may interact with
+it.
+
+Prior to this ADR, workspace isolation was **logical only**: the workspace ID
+scoped database queries, but there was no enforcement mechanism preventing a
+connector from writing into a workspace that it was not supposed to touch.  The
+following scenarios had no protection:
+
+1. An `untrusted_external` connector loading community threat feeds writes
+   enriched indicators into a `trusted_internal` workspace that holds
+   classified government-sourced intelligence.  The commingling contaminates
+   the provenance chain.
+
+2. An MSSP deployment with multiple customer tenants assigns each tenant their
+   own workspace.  A connector instance shared across tenants (e.g. a VirusTotal
+   client configured with the MSSP's API key) reads objects from workspace A and
+   enriches them into workspace B.
+
+3. A `semi_trusted` plugin agent (ADR-0045) is granted `write_stix` permission
+   but should only write to a sandbox workspace, not to the production
+   workspace.  There is no way to express this constraint.
+
+Connector trust levels are declared as class-level attributes (`TRUST_LEVEL`)
+since ADR-0039, and agent trust levels are registered with `AgentGovernor`
+since ADR-0045.  The missing piece was a mechanism to declare, on the
+**workspace** side, which trust levels and connector identities are permitted
+to interact with it.
+
+---
+
+## Decision
+
+Extend the `workspaces` database table and `Workspace` ORM class with two new
+fields that declare the workspace's trust boundary, then enforce that boundary
+at connector access time.
+
+### Database Schema Extension
+
+Alembic migration `0007_add_workspace_trust_boundary.py` adds two columns to
+the existing `workspaces` table:
+
+| Column | Type | Default | Notes |
+|---|---|---|---|
+| `trust_boundary` | `VARCHAR(50)` | `'semi_trusted'` | Minimum trust level required to access this workspace |
+| `allowed_connector_refs` | `TEXT` | `'[]'` | JSON array of permitted connector class names; empty list means all connectors at or above `trust_boundary` are permitted |
+
+Both columns are nullable at the database level for backward compatibility with
+existing rows; application code treats `NULL` as the defaults shown above.
+
+```sql
+-- Migration 0007 (excerpt)
+ALTER TABLE workspaces
+    ADD COLUMN trust_boundary VARCHAR(50) NOT NULL DEFAULT 'semi_trusted';
+
+ALTER TABLE workspaces
+    ADD COLUMN allowed_connector_refs TEXT NOT NULL DEFAULT '[]';
+
+CREATE INDEX ix_workspaces_trust_boundary ON workspaces (trust_boundary);
+```
+
+### `Workspace` ORM Changes
+
+`WorkspaceModel` (SQLAlchemy) gains the two mapped columns.  The `Workspace`
+domain class gains corresponding attributes and one new method:
+
+```python
+@dataclass
+class Workspace:
+    # ... existing fields ...
+    trust_boundary: str = "semi_trusted"
+    allowed_connector_refs: list[str] = field(default_factory=list)
+
+    def check_connector_trust(self, connector: object) -> None:
+        """
+        Raise PermissionError if `connector` is not permitted to access
+        this workspace.
+
+        Checks two conditions in order:
+        1. The connector's TRUST_LEVEL rank must be >= trust_boundary rank.
+        2. If allowed_connector_refs is non-empty, the connector's class name
+           must appear in the list.
+
+        Parameters
+        ----------
+        connector : object
+            Any connector instance that has a TRUST_LEVEL class variable.
+
+        Raises
+        ------
+        PermissionError
+            If the connector does not satisfy the workspace trust boundary.
+        """
+        connector_trust = getattr(type(connector), "TRUST_LEVEL", "untrusted_external")
+        if _trust_rank(connector_trust) < _trust_rank(self.trust_boundary):
+            self._log_violation(connector, "trust_level_insufficient")
+            raise PermissionError(
+                f"Connector '{type(connector).__name__}' has trust level "
+                f"'{connector_trust}', but workspace '{self.workspace_id}' "
+                f"requires '{self.trust_boundary}' or higher."
+            )
+        if self.allowed_connector_refs:
+            connector_name = type(connector).__name__
+            if connector_name not in self.allowed_connector_refs:
+                self._log_violation(connector, "connector_not_in_allowlist")
+                raise PermissionError(
+                    f"Connector '{connector_name}' is not in the allowlist "
+                    f"for workspace '{self.workspace_id}'."
+                )
+```
+
+### Trust Rank Ordering
+
+```python
+_TRUST_RANK: dict[str, int] = {
+    "untrusted_external": 0,
+    "semi_trusted":       1,
+    "trusted_internal":   2,
+}
+
+def _trust_rank(level: str) -> int:
+    return _TRUST_RANK.get(level, 0)
+```
+
+The ordering is: `trusted_internal` > `semi_trusted` > `untrusted_external`.
+A workspace with `trust_boundary = "trusted_internal"` rejects connectors at
+`semi_trusted` or `untrusted_external` even if those connectors are otherwise
+granted `write_stix` by `AgentGovernor`.
+
+### Enforcement Points
+
+`check_connector_trust()` is called in two locations:
+
+1. **`Workspace._init_store()`** — at workspace initialisation, when a
+   connector is bound to the workspace for the first time.
+2. **`IngestPipeline.run()`** — immediately before the first `upsert_object()`
+   call, after `ExecutionContext` has been established.
+
+Both call sites catch `PermissionError`, log the violation to `execution_log`
+as a `security_event` row (see ADR-0039), and re-raise.
+
+### Configuring Workspace Trust Boundaries
+
+Workspace trust boundaries are set at workspace creation time via the `Workspace`
+API or the CLI:
+
+```python
+from gnat.context.workspace import Workspace
+
+# Create a high-trust workspace that only accepts VirusTotal and CrowdStrike
+ws = Workspace.create(
+    name="classified-intel",
+    trust_boundary="trusted_internal",
+    allowed_connector_refs=["VirusTotalClient", "CrowdStrikeClient"],
+)
+
+# Update an existing workspace's trust boundary
+ws = Workspace.load("production")
+ws.trust_boundary = "semi_trusted"
+ws.allowed_connector_refs = []   # any semi_trusted or higher connector is fine
+ws.save()
+```
+
+CLI equivalent:
+
+```bash
+gnat workspace create classified-intel \
+    --trust-boundary trusted_internal \
+    --allow-connector VirusTotalClient \
+    --allow-connector CrowdStrikeClient
+
+gnat workspace set-trust production --trust-boundary semi_trusted
+```
+
+### Violation Logging
+
+Every `PermissionError` raised by `check_connector_trust()` is written to the
+`execution_log` table as a `security_event`:
+
+```python
+def _log_violation(self, connector: object, reason: str) -> None:
+    self._ctx_store.append_event(
+        context_id=self._active_context_id,
+        event_type="security_event",
+        metadata={
+            "violation": "workspace_trust_boundary",
+            "workspace_id": self.workspace_id,
+            "trust_boundary": self.trust_boundary,
+            "connector": type(connector).__name__,
+            "connector_trust": getattr(type(connector), "TRUST_LEVEL", "unknown"),
+            "allowed_connector_refs": self.allowed_connector_refs,
+            "reason": reason,
+        },
+    )
+```
+
+These rows are queryable alongside all other execution context events, making
+boundary violations visible in the same audit trail as agent permission denials
+(ADR-0045) and data lineage events (ADR-0038).
+
+### Default Behaviour (Backward Compatibility)
+
+Existing workspaces that do not have `trust_boundary` set receive
+`'semi_trusted'` from the migration default.  This means all `semi_trusted`
+and `trusted_internal` connectors continue to work without any configuration
+change.  `untrusted_external` connectors (community feed readers, OSINT
+scrapers) are blocked from existing workspaces unless the boundary is
+explicitly lowered to `'untrusted_external'`.
+
+This is a deliberate, slightly breaking default: if any existing deployment
+uses an `untrusted_external` connector to write into a workspace, it will begin
+receiving `PermissionError` after the migration.  The operator must explicitly
+set `trust_boundary = "untrusted_external"` for those workspaces to restore
+prior behaviour.  This is the correct security posture: the old behaviour was
+unintentionally permissive.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Trust-aware workspace isolation:** the workspace itself declares what it
+  trusts, rather than relying solely on the permission matrix in
+  `AgentGovernor`.  This enables a defence-in-depth model where both the action
+  policy and the target resource enforce trust constraints independently.
+- **Zero-trust workspaces are possible:** a workspace with
+  `trust_boundary = "trusted_internal"` and a non-empty `allowed_connector_refs`
+  list will reject every connector that is not explicitly named — suitable for
+  classified or high-value intelligence stores.
+- **MSSP tenancy is enforceable:** each customer workspace can be given an
+  allowlist of their specific connector instances, preventing cross-tenant
+  write-through.
+- **Violations are auditable:** every blocked access is logged as a
+  `security_event` in `execution_log`, giving operators a clear record of
+  attempted boundary crossings.
+- **Backward-compatible default:** the `'semi_trusted'` default preserves
+  existing behaviour for the vast majority of deployments.
+
+### Negative / Trade-offs
+
+- **Slightly breaking for `untrusted_external` connectors:** deployments that
+  rely on community feed connectors writing directly to default workspaces will
+  require a one-time configuration update after the migration.
+- **`allowed_connector_refs` is a class name string:** it compares against
+  `type(connector).__name__`, which means it is case-sensitive and does not
+  survive connector class renames.  A more robust connector identity mechanism
+  (e.g. a `CONNECTOR_ID` class constant) is deferred.
+- **Enforcement is at the GNAT application layer:** database-level row-security
+  policies (e.g. PostgreSQL RLS) are not implemented.  A connector that
+  bypasses the GNAT application layer and writes directly to the database is
+  not constrained.
+
+### Deferred
+
+- `CONNECTOR_ID` class constant on `BaseClient` to decouple allowlist entries
+  from class names
+- Database-level row security (PostgreSQL RLS) for multi-process deployments
+  where multiple GNAT workers share a database
+- TUI workspace inspector showing trust boundary configuration and recent
+  violation events
+- Per-workspace read boundary (currently `check_connector_trust()` is called
+  on write paths only; read-path enforcement is deferred)
+
+---
+
+## Alternatives Considered
+
+### Separate Database Schema Per Tenant
+
+Each tenant workspace would live in a separate database schema or database
+instance, providing hard isolation at the storage layer.  Rejected because it
+requires database-level provisioning for each workspace, complicates migrations,
+and makes cross-workspace queries (e.g. correlation across tenants for MSSP
+analytics) impossible without a federation layer.  The application-level trust
+boundary model achieves the required isolation for the current threat model at
+far lower operational cost.
+
+### TLP-Only Filtering
+
+Restrict connector write access based on the TLP marking of the STIX objects
+rather than the trust level of the connector.  Rejected because TLP controls
+*dissemination* of intelligence (who may see it), not *provenance* (who may
+write it).  A `semi_trusted` connector should not be allowed to inject objects
+into a workspace designated for `trusted_internal` sources even if the objects
+carry TLP:WHITE markings.
+
+### Policy Engine Allowlist (ADR-0037)
+
+The existing policy engine (ADR-0037) could be extended to express workspace
+trust boundaries as policy rules rather than workspace attributes.  Rejected
+for this phase because workspace trust is a stable property of the workspace
+itself, not a dynamic rule that should be evaluated against arbitrary
+conditions.  The policy engine is a better home for complex, contextual
+decisions (e.g. "allow if the object's confidence score exceeds 80"); workspace
+boundary enforcement is simpler and benefits from being collocated with the
+workspace model.
+
+### Connector-Level Workspace Declarations
+
+Each connector class could carry a list of workspace IDs it is permitted to
+access (inverting the relationship — connector declares targets instead of
+workspace declaring sources).  Rejected because workspace configuration is
+the correct authority for workspace-scoped policy.  Distributing access
+control across 99 connector class definitions would be operationally unwieldy.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0048-ADR-query-budget.md b/docs/explanation/architecture/adrs/0048-ADR-query-budget.md
new file mode 100644
index 00000000..9211bb57
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0048-ADR-query-budget.md
@@ -0,0 +1,360 @@
+# ADR-0048 — Query Budget and Cost Tracking (Phase 4E)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT coordinates calls to up to 99 external connector platforms.  Each
+connector call may count against a paid API quota, consume compute time, or
+contribute to rate-limit thresholds imposed by the upstream provider.
+
+Prior to this ADR, two mechanisms provided partial protection:
+
+1. **`AgentGovernor` rate limiting** (ADR-0045) — a sliding-window counter
+   per agent per time window, expressed in *number of governor-checked agent
+   actions*.  It does not account for the number of HTTP calls each action
+   generates, which may be many (e.g. a `list_objects()` that pages through
+   5 000 results).
+
+2. **`QueryBudget` on `ExecutionContext`** (ADR-0039) — a `max_connector_calls`
+   field on the context dataclass.  It was designed as a placeholder but had
+   no enforcement mechanism: `BaseClient._request()` did not check it, and
+   there was no `BudgetExceeded` exception class.
+
+The consequence was that an agent or pipeline with unrestricted connector
+access could:
+
+- Page through an entire VirusTotal result set in a single `list_objects()`
+  call, exhausting the day's API quota for the entire deployment.
+- Create a thundering-herd problem where multiple parallel enrichment
+  pipelines all call the same rate-limited platform simultaneously.
+- Provide no cost attribution: there was no record of which connector, agent,
+  or pipeline consumed the most API calls over a given period.
+
+These gaps made GNAT unsuitable for deployments with strict API cost controls
+or quota-sharing across teams.
+
+---
+
+## Decision
+
+Extend `QueryBudget` (introduced as a stub in ADR-0039) into a fully
+functional cost-tracking and enforcement mechanism, and wire it into the hot
+path of `BaseClient._request()`.
+
+### `QueryBudget` Dataclass (Extended)
+
+Located in `gnat/core/context.py`, replacing the stub from ADR-0039:
+
+```python
+@dataclass
+class QueryBudget:
+    """Per-execution resource budget for connector API calls.
+
+    Parameters
+    ----------
+    max_units : int
+        Maximum total cost units for this execution.  Each connector call
+        deducts ``COST_UNIT`` units from the budget.  Raise
+        ``BudgetExceeded`` when the budget is exhausted.
+    """
+
+    max_units: int
+    _consumed: int = field(default=0, repr=False, init=False)
+
+    @property
+    def remaining(self) -> int:
+        """Remaining cost units."""
+        return self.max_units - self._consumed
+
+    @property
+    def is_exhausted(self) -> bool:
+        """True when no budget remains."""
+        return self._consumed >= self.max_units
+
+    def consume(self, units: int, connector: str) -> None:
+        """Deduct *units* from the budget on behalf of *connector*.
+
+        Parameters
+        ----------
+        units : int
+            Cost units to deduct.  Use ``BaseClient.COST_UNIT`` (default 1)
+            for single-item requests; use larger values for bulk/search ops.
+        connector : str
+            Connector class name, used for cost attribution logging.
+
+        Raises
+        ------
+        BudgetExceeded
+            If deducting *units* would exceed ``max_units``.
+        """
+        if self._consumed + units > self.max_units:
+            raise BudgetExceeded(
+                connector=connector,
+                cost=units,
+                remaining=self.remaining,
+            )
+        self._consumed += units
+```
+
+### `BudgetExceeded` Exception
+
+```python
+class BudgetExceeded(GNATClientError):
+    """Raised when a connector call would exceed the active QueryBudget.
+
+    Attributes
+    ----------
+    connector : str
+        Name of the connector that attempted the call.
+    cost : int
+        Cost units the call would have consumed.
+    remaining : int
+        Budget units remaining at the time of the attempt.
+    """
+
+    def __init__(self, connector: str, cost: int, remaining: int) -> None:
+        self.connector = connector
+        self.cost = cost
+        self.remaining = remaining
+        super().__init__(
+            f"Budget exhausted: connector='{connector}' attempted "
+            f"cost={cost} but only {remaining} units remain."
+        )
+```
+
+`BudgetExceeded` inherits from `GNATClientError` (from `gnat.clients.base`)
+so it is caught by the standard error handling path and propagates through
+pipelines identically to any other HTTP-layer failure.
+
+### `COST_UNIT` Class Variable on `BaseClient`
+
+```python
+class BaseClient:
+    COST_UNIT: int = 1        # default: 1 unit per HTTP request
+    TRUST_LEVEL: str = "semi_trusted"
+
+    def _request(self, method: str, path: str, **kwargs) -> urllib3.HTTPResponse:
+        if self._context and self._context.budget:
+            self._context.budget.consume(
+                self.COST_UNIT,
+                connector=type(self).__name__,
+            )
+        # ... existing HTTP dispatch ...
+```
+
+Connectors that make bulk or search calls override `COST_UNIT` to reflect
+their relative expense:
+
+| Connector Category | `COST_UNIT` | Rationale |
+|---|---|---|
+| Standard single-object GET / POST | `1` | Default; one API call, one result |
+| Bulk list / paginated results | `10` | One call may return hundreds of objects |
+| Full-text search queries | `5` | Search indexes are expensive to query at scale |
+| AI inference calls (LLM connectors) | `20` | Token cost is orders of magnitude above REST calls |
+
+Example for the VirusTotal connector, which supports paginated list endpoints:
+
+```python
+class VirusTotalClient(BaseClient):
+    COST_UNIT = 1   # single-lookup default
+
+    def list_objects(self, query: str, limit: int = 100) -> list[dict]:
+        # Bulk paging — charge 10 per page
+        results = []
+        cursor = None
+        while True:
+            if self._context and self._context.budget:
+                self._context.budget.consume(10, connector="VirusTotalClient")
+            page = self._request("GET", f"/intelligence/search?query={query}&cursor={cursor}")
+            # ... parse and accumulate ...
+            if not page.get("meta", {}).get("cursor"):
+                break
+            cursor = page["meta"]["cursor"]
+        return results
+```
+
+### `ExecutionContext.create()` with Budget
+
+The `max_budget_units` parameter on `ExecutionContext.create()` is now wired:
+
+```python
+ctx = ExecutionContext.create(
+    initiated_by="enrichment-pipeline",
+    domain="analysis",
+    workspace_id="production",
+    max_budget_units=500,
+)
+# ctx.budget is a QueryBudget(max_units=500)
+
+# With no budget limit:
+ctx = ExecutionContext.create(
+    initiated_by="manual",
+    domain="ingestion",
+    workspace_id="sandbox",
+    # max_budget_units omitted → ctx.budget is None → unlimited
+)
+```
+
+### Cost Logging — `query_cost_log` Table
+
+Every call to `QueryBudget.consume()` appends a row to the `query_cost_log`
+table (Alembic migration `0008_add_query_cost_log.py`):
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | `INTEGER` | Auto-increment primary key |
+| `context_id` | `VARCHAR(36)` | FK → `execution_log.id` |
+| `connector` | `VARCHAR(200)` | Connector class name |
+| `cost_units` | `INTEGER` | Units deducted by this call |
+| `cumulative_consumed` | `INTEGER` | Budget state after deduction |
+| `budget_max` | `INTEGER` | `max_units` of the owning `QueryBudget` |
+| `recorded_at` | `DATETIME` | UTC timestamp |
+
+```sql
+-- Migration 0008 (excerpt)
+CREATE TABLE query_cost_log (
+    id                  INTEGER PRIMARY KEY AUTOINCREMENT,
+    context_id          VARCHAR(36)  NOT NULL,
+    connector           VARCHAR(200) NOT NULL,
+    cost_units          INTEGER      NOT NULL,
+    cumulative_consumed INTEGER      NOT NULL,
+    budget_max          INTEGER      NOT NULL,
+    recorded_at         DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (context_id) REFERENCES execution_log(id)
+);
+
+CREATE INDEX ix_query_cost_log_context ON query_cost_log (context_id);
+CREATE INDEX ix_query_cost_log_connector ON query_cost_log (connector, recorded_at);
+```
+
+Logging is best-effort: a failure to write to `query_cost_log` is caught and
+logged at `WARNING` level but does not propagate.  The budget deduction itself
+always occurs before the log write, so enforcement is never skipped.
+
+### Querying Cost Attribution
+
+```python
+from gnat.core.context import CostAttributionQuery
+
+report = CostAttributionQuery(db_session).by_connector(
+    connector="VirusTotalClient",
+    since=datetime(2026, 4, 1),
+)
+# Returns list of (date, connector, total_units, call_count)
+
+report = CostAttributionQuery(db_session).by_context(context_id="...")
+# Returns per-connector breakdown for a single execution
+```
+
+### Configuration
+
+```ini
+[context]
+default_budget_units = 0          ; 0 = unlimited (default for manual runs)
+pipeline_budget_units = 1000      ; budget applied to scheduled pipeline runs
+agent_budget_units = 200          ; budget applied to each agent session
+```
+
+When `pipeline_budget_units` is set, `FeedScheduler` automatically creates
+an `ExecutionContext` with `max_budget_units=pipeline_budget_units` for every
+scheduled feed run.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Hard resource limit for pipelines and agents:** a misconfigured
+  `ResearchAgent` looping over VirusTotal will hit `BudgetExceeded` after
+  `max_budget_units / COST_UNIT` calls rather than running indefinitely.
+- **First-class error with actionable context:** `BudgetExceeded` carries
+  `connector`, `cost`, and `remaining` — the operator can immediately see
+  which connector triggered the limit and by how much.
+- **Per-connector cost attribution:** `query_cost_log` provides a persistent,
+  queryable record of which connectors consumed what share of the budget over
+  any time window.  This enables quota planning and chargeback reporting for
+  MSSP deployments.
+- **Zero overhead when no budget is set:** if `ctx.budget` is `None`, the
+  `if` guard in `_request()` is a single attribute lookup that short-circuits
+  immediately.  Deployments that do not need budget enforcement pay no cost.
+- **Bulk and search overrides enable accurate cost modelling:** connectors
+  that page through large result sets can declare realistic `COST_UNIT`
+  multipliers rather than counting every paginated request as 1 unit.
+
+### Negative / Trade-offs
+
+- **`COST_UNIT` is a class constant, not a per-call value:** a connector
+  cannot dynamically adjust the cost of a call based on the response size
+  (e.g. charging more for a response with 10 000 results than one with 10).
+  Per-call dynamic costing is deferred.
+- **Cost logging adds one `INSERT` per connector call when a budget is
+  active:** high-frequency pipelines may produce large volumes of cost log
+  rows.  A retention or aggregation policy is needed for long-running
+  deployments.
+- **Budget is per-execution-context, not global:** two concurrent pipelines
+  each with a budget of 1 000 units can together consume 2 000 units from a
+  platform with a 1 500-unit daily quota.  Cross-context global quota
+  enforcement requires a shared counter (deferred).
+
+### Deferred
+
+- Global quota pool shared across concurrent `ExecutionContext` instances
+  (requires a Redis or database-backed counter)
+- Dynamic per-call cost calculation based on response size or token count
+- `query_cost_log` retention policy and aggregation rollups
+- Cost attribution dashboard in the TUI
+- Per-connector quota configuration in `config.ini` (e.g. `[virustotal]
+  daily_quota = 500`)
+
+---
+
+## Alternatives Considered
+
+### Connector-Level Rate Limits Only
+
+Apply rate limits at the connector level rather than introducing a budget
+concept on `ExecutionContext`.  For example, each connector would track its
+own call count and sleep or raise when a per-hour limit is reached.  Rejected
+because:
+
+1. Connector-level limits do not aggregate across connectors.  A pipeline
+   that calls five connectors 200 times each has made 1 000 total calls, but
+   no connector-level limit would fire.
+2. Rate limits and budgets serve different purposes: rate limits protect
+   against *throughput* spikes; budgets protect against *total cost* within
+   an execution.  Both are needed; budget enforcement complements rather than
+   replaces rate limiting.
+
+### OS-Level Resource Limits (cgroups / `resource.setrlimit`)
+
+Applying OS-level CPU or memory limits to pipeline processes was considered
+as a coarser alternative.  Rejected because it does not provide per-connector
+cost attribution, does not integrate with the GNAT audit trail, and does not
+map naturally to API quota units (which are a business concept, not an
+OS resource).
+
+### OpenAI / Anthropic Cost Estimators as the Model
+
+Using the token-count-based cost estimation models from LLM providers as the
+primary budget unit was considered.  Rejected because GNAT's connectors are
+predominantly REST API clients, not LLM callers.  A unified unit (abstract
+cost units with connector-specific `COST_UNIT` multipliers) is more flexible
+and does not require token counting infrastructure for non-LLM connectors.
+
+### Queue-Based Throttling (Celery / RQ)
+
+Routing all connector calls through a task queue and configuring per-connector
+concurrency limits was prototyped.  Rejected because it introduces a mandatory
+message broker dependency for a feature that should be available in single-
+process deployments.  Queue-based throttling remains an option for scale-out
+deployments but should not be required for the core use case.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md b/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md
new file mode 100644
index 00000000..9928c74d
--- /dev/null
+++ b/docs/explanation/architecture/adrs/0049-ADR-testing-framework.md
@@ -0,0 +1,427 @@
+# ADR-0049 — Simulation-Based Testing Framework (Phase 4E)
+
+**Date:** 2026-04-09  
+**Status:** Accepted  
+**Deciders:** GNAT Platform Team
+
+---
+
+## Context
+
+GNAT's unit test suite (`tests/unit/`) exercises connector logic through the
+`mock_http_response` and `mock_pool_manager` fixtures defined in
+`tests/conftest.py`.  These fixtures mock at the HTTP layer (`urllib3.PoolManager`)
+and are effective for testing single connector methods in isolation.
+
+As GNAT's Phase 4 features were added, three gaps in the testing infrastructure
+became significant:
+
+### Gap 1 — No Full-Pipeline Connector Fixture
+
+The `mock_pool_manager` fixture returns raw HTTP bytes.  Tests that need to
+exercise a complete pipeline (ingest → enrich → export) must either:
+
+- Construct a chain of `mock_http_response` objects for every API call the
+  pipeline makes, which is brittle and tied to internal implementation order, or
+- Use a live connector, which requires network access and real credentials.
+
+There is no fixture connector that implements the full `ConnectorMixin` interface
+with predictable, in-memory STIX data — making pipeline-level unit tests
+impractical.
+
+### Gap 2 — No Replay Testing
+
+`ExecutionContext.is_replay` (ADR-0039) is set by pipeline runners to suppress
+side effects during re-runs.  But there was no test framework support for
+verifying that a pipeline produces idempotent output: given the same
+`execution_log` entries from a previous run, a re-run should produce the same
+STIX IDs without duplicate write calls.
+
+### Gap 3 — Agent Tests Require Live Governor and Review Queue
+
+Tests for `AgentGovernor` (ADR-0045) and `HITLGateway` (ADR-0046) need a
+complete governance stack, including a `ReviewService` that auto-approves
+actions so the test can proceed without human input.  Assembling this stack
+from individual fixtures in each test file is repetitive and error-prone.
+
+---
+
+## Decision
+
+Introduce a **`gnat/testing/`** package with three components that together
+make full-pipeline, replay, and agent governance tests practical without
+network access or live credentials.
+
+All three components live in `gnat/testing/simulation.py` and are exported
+from `gnat/testing/__init__.py`.
+
+### Component 1 — `SimulationConnector`
+
+A `ConnectorMixin`-compatible connector backed entirely by an in-memory list
+of STIX fixture objects.  No HTTP calls are made.
+
+```python
+from gnat.testing import SimulationConnector
+from gnat.orm.indicator import Indicator
+
+connector = SimulationConnector(trust_level="semi_trusted")
+
+# Preload fixtures
+ioc = Indicator(name="evil.example.com", pattern="[domain-name:value = 'evil.example.com']")
+connector.add_fixture(ioc.to_dict())
+
+# Standard ConnectorMixin interface works as expected
+objects = connector.list_objects()         # returns [ioc.to_dict()]
+obj = connector.get_object(ioc.id)        # returns ioc.to_dict()
+connector.upsert_object({"type": "indicator", ...})  # appended to fixture list
+connector.delete_object(ioc.id)           # removes from fixture list
+
+# Iterate all fixtures (useful for pipeline testing)
+for stix_obj in connector.iter_fixtures():
+    print(stix_obj["type"], stix_obj["id"])
+```
+
+#### Error-Path Testing
+
+```python
+# Simulate connector failures for error-path tests
+connector = SimulationConnector(raise_on_request=True)
+# All list_objects() / get_object() calls raise GNATClientError
+```
+
+#### Budget Integration
+
+`SimulationConnector` deducts from the active `QueryBudget` on every call,
+just as a real connector would.  This lets tests verify that a pipeline's
+budget arithmetic is correct without making real HTTP calls:
+
+```python
+ctx = ExecutionContext.create(
+    initiated_by="test",
+    domain="ingestion",
+    workspace_id="test-ws",
+    max_budget_units=5,
+)
+connector = SimulationConnector()
+connector._context = ctx
+
+# Budget is charged on each call
+connector.list_objects()   # consumes COST_UNIT (1)
+connector.list_objects()   # consumes 1 more
+# After 5 calls, BudgetExceeded is raised
+```
+
+#### Full `ConnectorMixin` Interface
+
+| Method | Behaviour |
+|---|---|
+| `authenticate()` | No-op; always succeeds |
+| `health_check()` | Returns `{"status": "ok"}` |
+| `list_objects()` | Returns copy of the fixture list |
+| `get_object(stix_id)` | Finds by `id` field; raises `KeyError` if not found |
+| `upsert_object(obj)` | Appends if new `id`; replaces if existing `id` |
+| `delete_object(stix_id)` | Removes by `id`; no-op if not found |
+| `to_stix(obj)` | Identity transform (returns `obj`) |
+| `from_stix(stix_obj)` | Identity transform (returns `stix_obj`) |
+| `add_fixture(obj)` | Test helper: pre-loads a STIX object |
+| `iter_fixtures()` | Test helper: yields all current fixture objects |
+
+### Component 2 — `ReplayRunner`
+
+A test helper that verifies pipeline idempotency using the `execution_log`.
+
+```python
+from gnat.testing import ReplayRunner
+
+def my_pipeline(ctx: ExecutionContext) -> list[dict]:
+    connector = SimulationConnector()
+    connector.add_fixture(indicator_dict)
+    return connector.list_objects()
+
+runner = ReplayRunner(pipeline_fn=my_pipeline)
+
+# First run: executes pipeline and records execution_log entries
+first_run_ids = runner.run_first(workspace_id="test-ws")
+
+# Replay: re-executes each log entry with is_replay=True,
+# asserts all expected STIX IDs appear in output
+runner.replay(
+    execution_log=runner.last_execution_log,
+    expected_stix_ids=first_run_ids,
+)
+# Raises AssertionError if any expected ID is missing from the replay output
+```
+
+#### `ReplayRunner` Internals
+
+```python
+class ReplayRunner:
+    def __init__(self, pipeline_fn: Callable[[ExecutionContext], list[dict]]):
+        self._pipeline_fn = pipeline_fn
+        self.last_execution_log: list[dict] = []
+
+    def run_first(self, workspace_id: str = "default") -> list[str]:
+        ctx = ExecutionContext.create(
+            initiated_by="test-replay-runner",
+            domain="ingestion",
+            workspace_id=workspace_id,
+        )
+        results = self._pipeline_fn(ctx)
+        self.last_execution_log = ctx._store.query(ctx.context_id)
+        return [obj["id"] for obj in results if "id" in obj]
+
+    def replay(
+        self,
+        execution_log: list[dict],
+        expected_stix_ids: list[str],
+    ) -> None:
+        replay_ctx = ExecutionContext.create(
+            initiated_by="test-replay-runner",
+            domain="ingestion",
+            workspace_id="default",
+            is_replay=True,
+        )
+        results = self._pipeline_fn(replay_ctx)
+        result_ids = {obj["id"] for obj in results if "id" in obj}
+        missing = set(expected_stix_ids) - result_ids
+        if missing:
+            raise AssertionError(
+                f"Replay produced different STIX IDs. Missing: {missing}"
+            )
+```
+
+### Component 3 — `AgentTestHarness`
+
+A convenience wrapper around `AgentGovernor` and `HITLGateway` that uses a
+`_MockReviewService` which auto-approves all submitted review items.
+
+```python
+from gnat.testing import AgentTestHarness
+from gnat.agents.governor import AgentActionType
+
+harness = AgentTestHarness(trust_level="semi_trusted")
+
+# Run an action through the full governance stack
+result = harness.run_action(
+    agent_id="test-agent",
+    action_type=AgentActionType.write_stix,
+    target_ref="indicator--abc123",
+    impact_level="high",  # normally blocked — auto-approved by MockReviewService
+)
+
+assert result["status"] == "approved"
+assert result["approved_by"] == "mock-reviewer"
+
+# Inspect all actions recorded during the test
+for action in harness.recorded_actions:
+    print(action.agent_id, action.action_type, action.status)
+
+# Assert specific governance outcomes
+harness.assert_action_recorded(
+    action_type=AgentActionType.write_stix,
+    status="approved",
+)
+harness.assert_no_permission_denied()
+harness.assert_rate_limit_not_exceeded()
+```
+
+#### `_MockReviewService`
+
+The mock review service used internally by `AgentTestHarness`:
+
+```python
+class _MockReviewService:
+    """Auto-approves all submitted review items for use in tests."""
+
+    def submit(self, item_type, payload, submitter, priority="normal"):
+        item_id = str(uuid4())
+        return ReviewItem(
+            id=item_id,
+            item_type=item_type,
+            payload=payload,
+            submitter=submitter,
+            status=ReviewStatus.APPROVED,
+            submitted_at=datetime.utcnow(),
+            reviewed_by="mock-reviewer",
+            reviewed_at=datetime.utcnow(),
+        )
+
+    def get(self, review_id: str) -> ReviewItem:
+        return ReviewItem(status=ReviewStatus.APPROVED, ...)
+
+    def reject(self, review_id: str, reason: str, reviewer: str) -> None:
+        pass   # no-op in mock
+```
+
+#### Policy Override Support
+
+`AgentTestHarness` exposes `set_policy_override()` for testing custom
+permission configurations:
+
+```python
+harness = AgentTestHarness(trust_level="untrusted_external")
+
+# Grant a normally-blocked action for this test
+harness.set_policy_override(
+    agent_id="test-agent",
+    action_type=AgentActionType.export,
+    allowed=True,
+)
+
+result = harness.run_action(
+    agent_id="test-agent",
+    action_type=AgentActionType.export,
+    target_ref="bundle--xyz",
+    impact_level="medium",
+)
+assert result["status"] == "approved"
+```
+
+### Package Layout
+
+```
+gnat/testing/
+├── __init__.py          # Exports: SimulationConnector, ReplayRunner, AgentTestHarness
+└── simulation.py        # All three components in one module
+```
+
+The `gnat/testing/` package is part of the `[dev]` extras group and is not
+included in the core install:
+
+```toml
+[project.optional-dependencies]
+dev = [
+    # ... existing dev deps ...
+    "gnat[testing]",
+]
+testing = []   # gnat/testing/ is pure Python; no extra deps required
+```
+
+### Integration with Existing Fixtures
+
+`SimulationConnector` is compatible with the existing `mock_pool_manager`
+fixture.  Tests that need both HTTP-level mocking (for a real connector) and
+a simulation connector (for a parallel pipeline branch) can use both in the
+same test:
+
+```python
+def test_enrichment_pipeline(mock_pool_manager, minimal_config):
+    real_connector = VirusTotalClient.from_config(minimal_config)
+    sim_connector = SimulationConnector(trust_level="trusted_internal")
+    sim_connector.add_fixture(indicator_dict)
+
+    pipeline = EnrichPipeline(
+        source=sim_connector,
+        enricher=real_connector,  # HTTP calls intercepted by mock_pool_manager
+    )
+    result = pipeline.run(workspace_id="test")
+    assert len(result.enriched) == 1
+```
+
+---
+
+## Consequences
+
+### Positive
+
+- **Full pipeline tests without network or credentials:** `SimulationConnector`
+  implements the complete `ConnectorMixin` interface, so any pipeline that
+  accepts a connector can be tested end-to-end in a unit test with no network
+  dependency.
+- **Idempotency assertions are built-in:** `ReplayRunner` provides a standard,
+  reusable way to verify that a pipeline produces the same STIX IDs on first
+  run and replay — a previously unverifiable property.
+- **Agent tests are fully deterministic:** `AgentTestHarness` with
+  `_MockReviewService` removes the non-determinism introduced by human review
+  queue state, making governance tests runnable in CI without any external
+  state.
+- **Budget testing at no extra cost:** `SimulationConnector` participates in
+  `QueryBudget` accounting, so budget arithmetic can be tested without real
+  HTTP calls.
+- **No new runtime dependencies:** `gnat/testing/` is pure Python and
+  introduces no additional packages.  It reuses existing GNAT infrastructure
+  (`ExecutionContext`, `AgentGovernor`, `HITLGateway`, `ReviewItem`).
+
+### Negative / Trade-offs
+
+- **`SimulationConnector` does not validate STIX schema:** objects loaded via
+  `add_fixture()` are stored and returned as plain dicts without STIX 2.1
+  schema validation.  Tests that depend on strict STIX conformance must add
+  their own validation or use the `stix-validate` extra.
+- **`ReplayRunner` assumes pure-function pipelines:** pipelines that produce
+  different STIX IDs for the same input (e.g. because they embed
+  `datetime.utcnow()` in generated object IDs) will fail the idempotency
+  assertion.  These pipelines must be refactored to accept a deterministic
+  clock before they can be replay-tested.
+- **`_MockReviewService` always approves:** tests that need to verify
+  rejection-path behaviour must subclass `AgentTestHarness` and supply a
+  custom review service.
+
+### Deferred
+
+- `SimulationConnector` STIX schema validation mode (using `stix2-patterns`)
+- `ReplayRunner` diff output: when IDs differ between runs, show which IDs
+  were added and which were removed rather than a bare set difference
+- `AgentTestHarness` rejection-path helper: `set_auto_reject(action_type)`
+  to configure the mock service to reject specific action types
+- Pytest plugin (`conftest.py` auto-injection) to make `SimulationConnector`
+  and `AgentTestHarness` available as fixtures without explicit import
+
+---
+
+## Alternatives Considered
+
+### VCR Cassette Recording
+
+The `vcrpy` library records real HTTP interactions to YAML cassette files and
+replays them in subsequent test runs.  This was evaluated as an alternative to
+`SimulationConnector` for full-pipeline tests.  Rejected because:
+
+1. Connector responses vary considerably across platforms: pagination cursors,
+   timestamps, and session tokens change between runs, requiring heavy cassette
+   filtering that is difficult to maintain.
+2. Cassettes capture the *HTTP layer*, not the *connector interface*.  A change
+   to a connector's internal request structure (e.g. adding a query parameter)
+   invalidates the cassette even if the connector's public API is unchanged.
+3. Cassettes for 99 connectors would add significant binary content to the
+   repository.
+
+`SimulationConnector` operates at the connector interface level, above HTTP,
+and requires no cassette maintenance.
+
+### Docker-Based Integration Tests Only
+
+Accepting that full-pipeline tests require Docker (as the existing `--run-docker`
+integration suite does) was evaluated.  Rejected for this use case because:
+
+1. Docker integration tests are slow (30–120 seconds each) and cannot serve as
+   unit tests that run on every pull request.
+2. They require a running Docker daemon, which is not available in all CI
+   environments.
+3. They test against real connector implementations (Splunk, MISP containers),
+   not against the GNAT pipeline logic itself.
+
+Docker integration tests remain the correct tool for verifying connector
+authentication and platform compatibility.  `gnat/testing/` is the correct
+tool for pipeline logic verification.
+
+### Pytest Fixtures for Each Governance Component
+
+Rather than `AgentTestHarness`, individual pytest fixtures could be registered
+in `tests/conftest.py` for `AgentGovernor`, `HITLGateway`, and
+`_MockReviewService`.  Rejected because:
+
+1. Fixtures are test-file-scoped; the harness is reusable outside the test
+   suite (e.g. in a REPL or notebook for interactive development).
+2. Assembling three fixtures in a consistent configuration is error-prone;
+   `AgentTestHarness` encapsulates the wiring and ensures consistent defaults.
+3. Per-component fixtures still require each test to know the correct wiring
+   order; `AgentTestHarness.run_action()` expresses intent more clearly.
+
+The existing `conftest.py` fixtures (`mock_http_response`, `mock_pool_manager`,
+`minimal_config`, `sak_client`) remain unchanged and continue to cover
+HTTP-level mocking.
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/explanation/architecture/adrs/README.md b/docs/explanation/architecture/adrs/README.md
index d302ad11..4c3af8c4 100644
--- a/docs/explanation/architecture/adrs/README.md
+++ b/docs/explanation/architecture/adrs/README.md
@@ -44,6 +44,21 @@ subsystems.
 35. [ADR-0035: Quality Agents](0035-ADR-quality-agents.md)
 36. [ADR-0036: Security Agents Phase B](0036-ADR-security-agents-phaseb.md)
 37. [ADR-0037: Adopt Responsible Disclosure, DCO, and Apache 2.0 Compliance](0037-ADR-adopt-responsible-disclosure-dco-and-apache-2.0-compliance.md)
+38. [ADR-0038: Data Lineage Tracking](0038-data-lineage.md)
+
+### Phase 4 — Control, Reasoning, Safety
+
+39. [ADR-0039: Unified Execution Context](0039-ADR-execution-context.md)
+40. [ADR-0040: Connector Trust Model](0040-ADR-connector-trust-model.md)
+41. [ADR-0041: Idempotency and Schema Evolution](0041-ADR-idempotency-schema-evolution.md)
+42. [ADR-0042: Hypothesis Engine](0042-ADR-hypothesis-engine.md)
+43. [ADR-0043: Negative Evidence Tracking](0043-ADR-negative-evidence.md)
+44. [ADR-0044: Reasoning Engine](0044-ADR-reasoning-engine.md)
+45. [ADR-0045: Agent Governance](0045-ADR-agent-governance.md)
+46. [ADR-0046: HITL Gateway](0046-ADR-hitl-gateway.md)
+47. [ADR-0047: Workspace Isolation and Trust Boundaries](0047-ADR-workspace-isolation.md)
+48. [ADR-0048: Query Budget and Cost Model](0048-ADR-query-budget.md)
+49. [ADR-0049: Testing Framework — Simulation and Replay](0049-ADR-testing-framework.md)
 
 ---
 
diff --git a/docs/explanation/architecture/diagrams.md b/docs/explanation/architecture/diagrams.md
index fb78694b..dbdc73df 100644
--- a/docs/explanation/architecture/diagrams.md
+++ b/docs/explanation/architecture/diagrams.md
@@ -25,17 +25,58 @@ GNAT is structured as a layered architecture:
 |-------|---------|---------------|
 | User Interfaces | `gnat/cli/`, `gnat/tui/`, `gnat/serve/` | CLI subcommands, Textual TUI, FastAPI REST + TAXII |
 | GNATClient Façade | `gnat/client.py` | Single entry point for all operations |
+| **Control & Safety (Phase 4)** | **`gnat/core/`** | **ExecutionContext, Domain boundaries, QueryBudget, trust enforcement** |
 | Core Pipelines | `gnat/ingest/`, `gnat/analysis/`, `gnat/agents/`, `gnat/research/` | Ingestion, analysis, AI, and research |
+| **Reasoning Layer (Phase 4C)** | **`gnat/reasoning/`** | **HypothesisEngine, ReasoningEngine, evidence scoring** |
+| **Agent Governance (Phase 4D)** | **`gnat/agents/governor.py`, `gnat/agents/hitl.py`** | **AgentGovernor, HITLGateway, XSOAR escalation** |
 | Intelligence Products | `gnat/reporting/`, `gnat/dissemination/` | Report lifecycle, export, webhooks |
 | Data Layer | `gnat/orm/`, `gnat/context/`, `gnat/search/` | STIX ORM, workspace persistence, Solr search |
+| **Custom SDOs (Phase 4C)** | **`gnat/stix/sdos/`** | **STIXHypothesis, NegativeEvidenceRecord** |
 | Platform Connectors | `gnat/connectors/` (99 platforms) | Bidirectional integration with external platforms |
-| HTTP Client Layer | `gnat/clients/`, `gnat/async_client/` | urllib3 (sync) + httpx (async) |
+| HTTP Client Layer | `gnat/clients/`, `gnat/async_client/` | urllib3 (sync) + httpx (async) + budget tracking |
 | Scheduling | `gnat/schedule/` | Cron-based feed scheduling |
+| **Testing Framework (Phase 4E)** | **`gnat/testing/`** | **SimulationConnector, ReplayRunner, AgentTestHarness** |
 
 → Full narrative: [`docs/architecture.md`](../../architecture.md)
 
 ---
 
+## Phase 4 Control Layer
+
+Phase 4 adds a **control and safety** layer that sits above all pipelines and connectors.
+Every GNAT operation is now tagged with an `ExecutionContext` that carries its identity,
+trust level, domain, and resource budget.
+
+```mermaid
+flowchart LR
+    subgraph Control ["gnat/core/ — Control Layer"]
+        CTX[ExecutionContext\ncontext_id, trust_level\ndomain, workspace_id]
+        BDG[QueryBudget\nmax_units, consumed]
+        DOM[Domain Boundary\n@domain_boundary decorator]
+    end
+
+    subgraph Reasoning ["gnat/reasoning/ — Reasoning Layer"]
+        HE[HypothesisEngine\npropose → evaluate → close]
+        RE[ReasoningEngine\nprioritize observables]
+    end
+
+    subgraph Gov ["gnat/agents/ — Governance"]
+        AG[AgentGovernor\ncan_act, rate_limit, audit]
+        HG[HITLGateway\nevaluate impact tier]
+    end
+
+    CTX --> BDG
+    CTX --> DOM
+    CTX --> HE
+    CTX --> RE
+    CTX --> AG
+    AG --> HG
+```
+
+→ ADRs: [0039](adrs/0039-ADR-execution-context.md) · [0040](adrs/0040-ADR-connector-trust-model.md) · [0041](adrs/0041-ADR-idempotency-schema-evolution.md) · [0042](adrs/0042-ADR-hypothesis-engine.md) · [0043](adrs/0043-ADR-negative-evidence.md) · [0044](adrs/0044-ADR-reasoning-engine.md) · [0045](adrs/0045-ADR-agent-governance.md) · [0046](adrs/0046-ADR-hitl-gateway.md) · [0047](adrs/0047-ADR-workspace-isolation.md) · [0048](adrs/0048-ADR-query-budget.md) · [0049](adrs/0049-ADR-testing-framework.md)
+
+---
+
 ## Connector Architecture
 
 The diagram below illustrates how the 99 platform connectors plug into GNAT via the
diff --git a/docs/explanation/architecture/workflow-diagrams.md b/docs/explanation/architecture/workflow-diagrams.md
index afe6cabd..464f7c48 100644
--- a/docs/explanation/architecture/workflow-diagrams.md
+++ b/docs/explanation/architecture/workflow-diagrams.md
@@ -235,7 +235,209 @@ flowchart LR
 
 ---
 
-## Using These Diagrams
+## 8. ExecutionContext Propagation (Phase 4A)
+
+This diagram shows how an `ExecutionContext` is created at pipeline entry and propagated
+through all downstream operations, providing end-to-end traceability.
+
+```mermaid
+sequenceDiagram
+    autonumber
+    actor Operator
+    participant Pipeline as IngestPipeline<br/>(gnat/ingest)
+    participant Ctx as ExecutionContext<br/>(gnat/core/context.py)
+    participant Log as execution_log<br/>(Postgres)
+    participant Client as BaseClient<br/>(gnat/clients/base.py)
+    participant Budget as QueryBudget
+
+    Operator->>Pipeline: run(source, workspace_id)
+    Pipeline->>Ctx: ExecutionContext.create(initiated_by, domain, workspace_id)
+    Ctx-->>Pipeline: ctx (context_id=UUID, trust_level, is_replay=False)
+    Pipeline->>Log: INSERT INTO execution_log (ctx.to_dict())
+    Log-->>Pipeline: ack
+
+    Pipeline->>Client: connector._context = ctx
+    loop Per observable
+        Client->>Budget: budget.consume(COST_UNIT, connector_name)
+        alt Budget exhausted
+            Budget-->>Client: raise BudgetExceeded
+        else OK
+            Client-->>Pipeline: HTTP response data
+        end
+    end
+
+    Note over Pipeline,Ctx: Child context for sub-operation
+    Pipeline->>Ctx: ctx.child(initiated_by="enrichment-agent", domain="analysis")
+    Ctx-->>Pipeline: child_ctx (parent_context_id=ctx.context_id)
+    Pipeline->>Log: INSERT INTO execution_log (child_ctx.to_dict())
+```
+
+---
+
+## 9. Hypothesis Engine Lifecycle (Phase 4C)
+
+The full propose → evaluate → close lifecycle for `STIXHypothesis` objects, showing
+how Solr corroboration and trust-weighted evidence feed into confidence updates.
+
+```mermaid
+sequenceDiagram
+    autonumber
+    actor Analyst
+    participant Engine as HypothesisEngine<br/>(gnat/reasoning/hypothesis.py)
+    participant WS as Workspace<br/>(gnat/context/workspace.py)
+    participant Solr as SolrSearchIndex<br/>(gnat/search/index.py)
+    participant H as STIXHypothesis<br/>(x-gnat-hypothesis SDO)
+
+    Analyst->>Engine: propose("APT29 behind Q1 campaign", evidence=["rel--1"], confidence=0.2)
+    Engine->>H: STIXHypothesis(statement, confidence=0.2, status="pending")
+    H->>H: add_supporting_evidence("rel--1")
+    Engine->>WS: _add_object(h.to_dict(), mark_dirty=True)
+    WS-->>Analyst: STIXHypothesis (id, confidence=0.2, status="pending")
+
+    Analyst->>Engine: evaluate(hypothesis_id)
+    Engine->>WS: load hypothesis object
+    Engine->>Solr: search(statement, limit=20)
+    Solr-->>Engine: [corroborating_stix_ids]
+    Engine->>Engine: corroboration_boost = min(len(ids) × 0.05, 0.3)
+    Engine->>Engine: raw = (support_count / total) + corroboration_boost
+    Engine->>H: update_confidence(clamped_raw)
+    alt confidence ≥ 0.75
+        H->>H: status = "confirmed"
+    else confidence ≤ 0.15 and refute_count > 0
+        H->>H: status = "refuted"
+    end
+    Engine->>WS: _add_object(h.to_dict(), mark_dirty=True)
+    Engine-->>Analyst: STIXHypothesis (updated confidence + status)
+
+    Analyst->>Engine: close(hypothesis_id, verdict="confirmed")
+    Engine->>H: close("confirmed")
+    Engine->>WS: _add_object(h.to_dict(), mark_dirty=True)
+    Engine-->>Analyst: STIXHypothesis (status="confirmed")
+```
+
+---
+
+## 10. ReasoningEngine Observable Scoring (Phase 4C)
+
+How `ReasoningEngine.prioritize()` scores a set of observables using five weighted signals.
+
+```mermaid
+flowchart TD
+    A([observable_set, context]) --> B[ReasoningEngine.prioritize]
+
+    B --> C[Gather NegativeEvidenceRecords\nfrom workspace]
+    C --> D[For each observable...]
+
+    D --> E1[trust_weight\nfrom ExecutionContext.trust_level]
+    D --> E2[age_factor\n1.0 − 5%×age_days]
+    D --> E3[neg_penalty\n0.3 × fresh_neg_count]
+    D --> E4[corroboration_bonus\nSolr hits × 0.05]
+
+    E1 --> F[Composite Score\nscore = trust×0.4 + age×0.3\n+ corroboration×0.3 − neg×0.5]
+    E2 --> F
+    E3 --> F
+    E4 --> F
+
+    F --> G[Clamp to 0.0–1.0]
+    G --> H[Build explanation dict\nmachine-readable components]
+    H --> I{store_notes?}
+    I -- Yes --> J[Write STIX note object\nlinked to observable]
+    I -- No  --> K
+
+    J --> K[Collect results]
+    K --> L[Sort by score DESC]
+    L --> M([return list of tuple: observable, score, explanation])
+
+    style F fill:#4ea8de,color:#fff
+    style M fill:#2d7a2d,color:#fff
+```
+
+---
+
+## 11. Agent Governance & HITL Flow (Phase 4D)
+
+How every agent action passes through `AgentGovernor` and `HITLGateway` before execution.
+
+```mermaid
+flowchart TD
+    A([Agent requests action]) --> B[AgentGovernor.can_act\nagent_id, action_type, trust_level]
+
+    B --> C{Policy override\nexists?}
+    C -- Yes --> D{Override allows?}
+    C -- No  --> E{Trust-level matrix\nallows?}
+
+    D -- No  --> F([raise AgentPermissionDenied])
+    D -- Yes --> G
+
+    E -- No  --> F
+    E -- Yes --> G[Rate limit check\nsliding window]
+
+    G --> H{Within limit?}
+    H -- No  --> I([raise RateLimitExceeded])
+    H -- Yes --> J[Create AgentAction\nimpact_level assigned]
+
+    J --> K[HITLGateway.evaluate]
+
+    K --> L{impact_level?}
+    L -- low/medium --> M[Auto-approve\napproved_by = auto-policy]
+    L -- high       --> N[ReviewService.submit\nstatus = PENDING]
+    L -- critical   --> O[ReviewService.submit\n+ XSOARClient notification]
+
+    N --> P{Human reviews...}
+    O --> P
+    P -- Approved --> Q[action.status = approved\nExecute action]
+    P -- Rejected --> R([Action cancelled])
+    P -- Timeout  --> S[Auto-reject\nreviewer = system-timeout]
+
+    M --> Q
+    Q --> T[AgentGovernor.record_action\nAudit log + HookBus emit]
+    T --> U([Action complete])
+
+    style F fill:#c0392b,color:#fff
+    style I fill:#c0392b,color:#fff
+    style R fill:#c0392b,color:#fff
+    style U fill:#2d7a2d,color:#fff
+```
+
+---
+
+## 12. Workspace Trust Boundary Enforcement (Phase 4E)
+
+How `check_connector_trust()` enforces isolation boundaries before allowing connector access.
+
+```mermaid
+flowchart TD
+    A([Connector attempts workspace access]) --> B[workspace.check_connector_trust\nconnector]
+
+    B --> C[Read type connector .TRUST_LEVEL]
+    C --> D[Read workspace.trust_boundary]
+
+    D --> E{connector_rank ≥\nrequired_rank?}
+    E -- No  --> F([raise PermissionError\nConnector trust too low])
+
+    E -- Yes --> G{allowed_connector_refs\nnon-empty?}
+    G -- No  --> H[Access granted]
+    G -- Yes --> I{connector class name\nin allowlist?}
+
+    I -- No  --> J([raise PermissionError\nConnector not in allowlist])
+    I -- Yes --> H
+
+    H --> K[Proceed with read/write]
+
+    style F fill:#c0392b,color:#fff
+    style J fill:#c0392b,color:#fff
+    style H fill:#2d7a2d,color:#fff
+
+    subgraph Trust Rank Order
+        TR1[trusted_internal = 2]
+        TR2[semi_trusted = 1]
+        TR3[untrusted_external = 0]
+    end
+```
+
+---
+
+
 
 All Mermaid diagrams in this file can be:
 
diff --git a/docs/how-to/README.md b/docs/how-to/README.md
index facd65c2..15753693 100644
--- a/docs/how-to/README.md
+++ b/docs/how-to/README.md
@@ -20,6 +20,10 @@ Pick the guide for your goal — no need to read them in order.
 | [Build Cross-Platform Investigations](build-investigations.md) | Collect and correlate evidence from multiple platforms into a unified evidence graph |
 | [Create Intelligence Reports](create-intelligence-reports.md) | Author structured intelligence products with a formal lifecycle and STIX 2.1 export |
 | [Disseminate Intelligence](disseminate-intelligence.md) | Export, webhook notifications, TAXII 2.1 serving, and REST API gateway |
+| **Phase 4 — Control, Reasoning, Safety** | |
+| [Use the Execution Context](use-execution-context.md) | Create and propagate `ExecutionContext`; enforce domain boundaries and trust levels; track query budgets |
+| [Use the Reasoning Engine](use-reasoning-engine.md) | Score and rank observables; propose, evaluate, and close hypotheses; track negative evidence |
+| [Agent Governance](agent-governance.md) | Permission checks, rate limiting, HITL review, XSOAR escalation, and agent audit trails |
 
 ---
 
diff --git a/docs/how-to/agent-governance.md b/docs/how-to/agent-governance.md
new file mode 100644
index 00000000..88163c9b
--- /dev/null
+++ b/docs/how-to/agent-governance.md
@@ -0,0 +1,247 @@
+# How-to: Agent Governance
+
+GNAT's agent governance layer ensures that every AI agent action is authorised,
+rate-limited, audited, and — for high-impact operations — reviewed by a human before
+execution.
+
+---
+
+## Prerequisites
+
+- GNAT installed (`pip install gnat`)
+- Optionally: `gnat/review/` configured with a `ReviewQueueStore` for HITL flows
+- Optionally: XSOAR connector configured for critical action notifications
+
+---
+
+## Check and Enforce Permissions
+
+```python
+from gnat.agents.governor import AgentGovernor, AgentPermissionDenied
+from gnat.policy.models import AgentActionType
+
+governor = AgentGovernor()
+
+# Check silently
+can_enrich = governor.can_act(
+    agent_id="research-agent-1",
+    action_type=AgentActionType.ENRICH,
+    trust_level="semi_trusted",
+)
+print(can_enrich)  # True
+
+# Raise on denial
+try:
+    governor.require_can_act(
+        agent_id="otx-reader",
+        action_type=AgentActionType.TRIGGER_PLAYBOOK,
+        trust_level="untrusted_external",
+    )
+except AgentPermissionDenied as e:
+    print(e)  # "otx-reader (trust='untrusted_external') denied trigger_playbook"
+```
+
+### Default permission matrix
+
+| Trust Level | Allowed Actions |
+|-------------|----------------|
+| `trusted_internal` | All actions (read_stix, write_stix, delete_stix, enrich, ingest, export, trigger_playbook, manage_workspace, escalate, hypothesize) |
+| `semi_trusted` | read_stix, write_stix, enrich, ingest, hypothesize, escalate |
+| `untrusted_external` | read_stix, enrich, hypothesize |
+
+---
+
+## Apply Per-Agent Overrides
+
+Override the default matrix at runtime or via config:
+
+```python
+# Allow a specific agent to trigger playbooks despite semi_trusted level
+governor.set_policy_override(
+    "high-fidelity-agent",
+    AgentActionType.TRIGGER_PLAYBOOK,
+    allowed=True,
+)
+
+# Deny an agent from deleting STIX objects even if trust would allow it
+governor.set_policy_override(
+    "read-only-agent",
+    AgentActionType.DELETE_STIX,
+    allowed=False,
+)
+```
+
+Or via INI (loaded by `AgentGovernor.from_config(cfg)`):
+
+```ini
+[agent_policy]
+high-fidelity-agent.trigger_playbook = true
+read-only-agent.delete_stix          = false
+```
+
+---
+
+## Rate Limiting
+
+```python
+from gnat.agents.governor import AgentGovernor, RateLimitExceeded
+
+governor = AgentGovernor(max_calls_per_window=50, window_seconds=60)
+
+for i in range(55):
+    try:
+        governor.rate_limit_check("bulk-agent")
+    except RateLimitExceeded as e:
+        print(f"Rate limit hit at call {i}: {e}")
+        break
+```
+
+---
+
+## Record Actions (Audit Trail)
+
+```python
+from gnat.agents.governor import AgentAction, AgentGovernor
+from gnat.policy.models import AgentActionType
+
+governor = AgentGovernor()
+
+action = AgentAction(
+    agent_id="threat-hunter-1",
+    action_type=AgentActionType.ENRICH,
+    target_ref="indicator--abc123",
+    impact_level="low",
+    context_id=ctx.context_id,  # link to ExecutionContext
+)
+
+governor.record_action(action)
+
+# Query audit log
+all_actions = governor.get_action_log()
+agent_actions = governor.get_action_log("threat-hunter-1")
+```
+
+---
+
+## HITL (Human-in-the-Loop) Gateway
+
+For high or critical impact actions, submit them for human review before executing:
+
+```python
+from gnat.agents.hitl import HITLGateway
+from gnat.agents.governor import AgentAction
+from gnat.policy.models import AgentActionType
+from gnat.review.service import ReviewService
+from gnat.review.store import ReviewQueueStore
+
+# Wire to existing review queue
+store = ReviewQueueStore(db_url="sqlite:///~/.gnat/gnat.db")
+store.create_all()
+review_service = ReviewService(store=store)
+
+gateway = HITLGateway(
+    review_service=review_service,
+    approval_timeout_seconds=3600,
+)
+
+action = AgentAction(
+    agent_id="incident-responder",
+    action_type=AgentActionType.TRIGGER_PLAYBOOK,
+    target_ref="indicator--malicious-ip",
+    impact_level="high",
+)
+
+approved, review_item = gateway.evaluate(action)
+
+if approved:
+    # low/medium: auto-approved, execute immediately
+    print("Action auto-approved, executing...")
+else:
+    # high: blocking — wait for human review
+    print(f"Awaiting approval. Review ID: {review_item.id}")
+
+    # Later, poll for status
+    from gnat.review.models import ReviewStatus
+    status = gateway.check_approval_status(review_item.id)
+    if status == ReviewStatus.APPROVED:
+        print("Approved by analyst, executing...")
+    elif status == ReviewStatus.REJECTED:
+        print("Rejected, action cancelled.")
+```
+
+### Impact tiers
+
+| Impact Level | Behaviour |
+|-------------|-----------|
+| `low` | Auto-approved immediately; logged only |
+| `medium` | Auto-approved immediately; logged only |
+| `high` | Submitted to ReviewService as PENDING; blocks execution |
+| `critical` | PENDING + XSOAR notification fired via `XSOARClient.upsert_object()` |
+
+---
+
+## Add XSOAR Notification for Critical Actions
+
+```python
+from gnat.connectors.xsoar.client import XSOARClient
+from gnat.agents.hitl import HITLGateway
+
+xsoar = XSOARClient(host="https://xsoar.example.com", api_key="...")
+
+gateway = HITLGateway(
+    review_service=review_service,
+    xsoar_client=xsoar,
+    approval_timeout_seconds=1800,  # 30 minutes
+)
+```
+
+---
+
+## Use AgentTestHarness in Tests
+
+The `AgentTestHarness` provides a fully deterministic test environment — all HITL
+submissions are auto-approved and all rate limits are effectively unlimited:
+
+```python
+from gnat.testing import AgentTestHarness
+from gnat.agents.governor import AgentPermissionDenied
+from gnat.policy.models import AgentActionType
+
+harness = AgentTestHarness()
+
+# Run an action end-to-end (permission check + rate limit + HITL + audit)
+approved, action = harness.run_action(
+    agent_id="test-agent",
+    action_type=AgentActionType.ENRICH,
+    target_ref="indicator--abc",
+    impact_level="low",
+    trust_level="semi_trusted",
+)
+
+assert approved is True
+assert action.status == "approved"
+assert len(harness.recorded_actions) == 1
+
+# Test permission denial
+try:
+    harness.run_action(
+        agent_id="restricted-agent",
+        action_type=AgentActionType.TRIGGER_PLAYBOOK,
+        trust_level="untrusted_external",
+    )
+except AgentPermissionDenied:
+    print("Correctly denied")
+```
+
+---
+
+## See Also
+
+- [ADR-0045 — Agent Governance Layer](../explanation/architecture/adrs/0045-ADR-agent-governance.md)
+- [ADR-0046 — HITL Gateway](../explanation/architecture/adrs/0046-ADR-hitl-gateway.md)
+- [ADR-0049 — Testing Framework](../explanation/architecture/adrs/0049-ADR-testing-framework.md)
+- [Reference: Configuration](../reference/configuration.md) — `[agent_policy]` section
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/how-to/use-execution-context.md b/docs/how-to/use-execution-context.md
new file mode 100644
index 00000000..fe1ddd72
--- /dev/null
+++ b/docs/how-to/use-execution-context.md
@@ -0,0 +1,198 @@
+# How-to: Use the Execution Context
+
+Every GNAT operation — pipeline run, connector call, agent action — is tagged with an
+`ExecutionContext` that carries its identity, domain, trust level, workspace boundary,
+and optional resource budget.  This guide shows how to create, propagate, and query
+execution contexts in your code.
+
+---
+
+## Prerequisites
+
+- GNAT installed (`pip install gnat`)
+- `sqlalchemy` installed for DB persistence (`pip install "gnat[persist]"`)
+- At least one connector configured in `~/.gnat/config.ini`
+
+---
+
+## Create a Context
+
+```python
+from gnat.core.context import ExecutionContext
+
+# Minimal context — defaults to semi_trusted, default policy set
+ctx = ExecutionContext.create(
+    initiated_by="manual",
+    domain="ingestion",
+    workspace_id="ws-apt28",
+)
+print(ctx.context_id)   # UUID string
+print(ctx.trust_level)  # "semi_trusted"
+print(ctx.is_replay)    # False
+```
+
+### Create from a connector (inherits trust level)
+
+```python
+from gnat.connectors.splunk.client import SplunkClient
+from gnat.core.context import ExecutionContext
+
+splunk = SplunkClient(host="https://splunk.example.com", ...)
+
+# Reads SplunkClient.TRUST_LEVEL = "trusted_internal" automatically
+ctx = ExecutionContext.from_connector(
+    connector=splunk,
+    domain="ingestion",
+    workspace_id="ws-siem",
+)
+print(ctx.trust_level)    # "trusted_internal"
+print(ctx.initiated_by)   # "SplunkClient"
+```
+
+### Create with a query budget
+
+```python
+ctx = ExecutionContext.create(
+    initiated_by="automated-pipeline",
+    domain="analysis",
+    workspace_id="ws-enrichment",
+    max_budget_units=500,   # connector calls are counted against this limit
+)
+print(ctx.budget.remaining)  # 500
+```
+
+---
+
+## Propagate Through a Pipeline
+
+Attach the context to a connector so budget tracking and logging work automatically:
+
+```python
+from gnat.connectors.virustotal.client import VirusTotalClient
+from gnat.clients.base import BudgetExceeded
+from gnat.core.context import ExecutionContext
+
+ctx = ExecutionContext.create(
+    initiated_by="enrichment-job",
+    domain="analysis",
+    workspace_id="ws-threats",
+    max_budget_units=100,
+)
+
+vt = VirusTotalClient(host="https://www.virustotal.com", api_key="...")
+vt._context = ctx   # attach context — budget will be deducted per request
+
+try:
+    result = vt.get("/api/v3/files/abc123")
+except BudgetExceeded as e:
+    print(f"Budget exhausted: {e.connector} wanted {e.cost} but only {e.remaining} left")
+```
+
+---
+
+## Create Child Contexts
+
+Sub-operations (e.g. an enrichment agent spawned by an ingestion pipeline) should use
+child contexts so the parent→child trace is preserved in `execution_log`:
+
+```python
+parent_ctx = ExecutionContext.create(
+    initiated_by="ingest-pipeline",
+    domain="ingestion",
+    workspace_id="ws-1",
+)
+
+# Child inherits workspace_id, trust_level, policy_set
+child_ctx = parent_ctx.child(
+    initiated_by="enrichment-agent",
+    domain="analysis",
+)
+
+print(child_ctx.parent_context_id == parent_ctx.context_id)  # True
+```
+
+---
+
+## Domain Boundaries
+
+The `@domain_boundary` decorator enforces that a function is only called from permitted
+upstream domains.  Violations raise `DomainBoundaryViolation`.
+
+```python
+from gnat.core.domains import Domain, domain_boundary, DomainBoundaryViolation
+
+@domain_boundary(Domain.REPORTING, allowed_callers=[Domain.INVESTIGATION, Domain.REPORTING])
+def generate_report(workspace, context):
+    ...
+
+@domain_boundary(Domain.INGESTION)
+def run_ingest():
+    # Calling generate_report from ingestion raises DomainBoundaryViolation
+    try:
+        generate_report(ws, ctx)
+    except DomainBoundaryViolation as e:
+        print(e)   # "ingestion cannot call into reporting domain"
+```
+
+---
+
+## Trust Level Enforcement
+
+Decorate functions that require a minimum trust level to execute:
+
+```python
+from gnat.core.domains import require_trust_level, TrustLevelViolation
+
+@require_trust_level("trusted_internal")
+def trigger_soar_playbook(playbook_id, context):
+    ...
+
+# Context with semi_trusted will raise
+ctx = ExecutionContext.create(initiated_by="ot", domain="execution", workspace_id="ws")
+try:
+    trigger_soar_playbook("PB-001", context=ctx)
+except TrustLevelViolation as e:
+    print(e)  # "requires trusted_internal but active trust is semi_trusted"
+```
+
+---
+
+## Replay Mode
+
+Set `is_replay=True` to suppress SOAR triggers and side-effects during replay runs:
+
+```python
+ctx = ExecutionContext.create(
+    initiated_by="replay-runner",
+    domain="ingestion",
+    workspace_id="ws-replay",
+    is_replay=True,
+)
+
+# Pipelines check ctx.is_replay before firing SOAR actions
+if not ctx.is_replay:
+    xsoar_client.trigger_playbook(...)
+```
+
+---
+
+## Serialise / Deserialise
+
+```python
+d = ctx.to_dict()
+# Store d in DB, pass over API boundary, etc.
+ctx2 = ExecutionContext.from_dict(d)
+```
+
+---
+
+## See Also
+
+- [ADR-0039 — Unified Execution Context](../explanation/architecture/adrs/0039-ADR-execution-context.md)
+- [ADR-0040 — Connector Trust Model](../explanation/architecture/adrs/0040-ADR-connector-trust-model.md)
+- [ADR-0048 — Query Budget](../explanation/architecture/adrs/0048-ADR-query-budget.md)
+- [Reference: Configuration](../reference/configuration.md)
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/how-to/use-reasoning-engine.md b/docs/how-to/use-reasoning-engine.md
new file mode 100644
index 00000000..b1a5525d
--- /dev/null
+++ b/docs/how-to/use-reasoning-engine.md
@@ -0,0 +1,218 @@
+# How-to: Use the Reasoning Engine
+
+GNAT's reasoning layer lets you score and rank STIX observables by evidence quality,
+track analyst hypotheses with structured evidence links, and suppress redundant connector
+queries using negative evidence records.
+
+---
+
+## Prerequisites
+
+- GNAT installed (`pip install gnat`)
+- A `WorkspaceManager` configured (see [How-to: Use Workspaces](use-workspaces.md))
+- Optionally: Solr search sidecar running (see `[search]` config section)
+
+---
+
+## Score Observables with ReasoningEngine
+
+`ReasoningEngine.prioritize()` assigns a composite score in `[0.0, 1.0]` to each
+observable based on:
+
+| Signal | Weight | Description |
+|--------|--------|-------------|
+| Connector trust weight | 40% | `trusted_internal`→0.9, `semi_trusted`→0.6, `untrusted_external`→0.3 |
+| Object age factor | 30% | 1.0 decaying by 5% per day from `modified` timestamp |
+| Cross-connector corroboration | 30% | Solr hit count × 0.05, capped at 0.25 |
+| Negative evidence penalty | −50% | min(0.3 × fresh NegativeEvidenceRecord count, 0.6) |
+
+```python
+from gnat.reasoning.engine import ReasoningEngine
+from gnat.core.context import ExecutionContext
+from gnat.context.workspace import WorkspaceManager
+
+manager = WorkspaceManager.default()
+
+# Create a context from your connector (sets trust_level automatically)
+from gnat.connectors.crowdstrike.client import CrowdStrikeClient
+cs = CrowdStrikeClient(host="...", client_id="...", client_secret="...")
+ctx = ExecutionContext.from_connector(cs, domain="analysis", workspace_id="my-ws")
+
+engine = ReasoningEngine(manager=manager, workspace_name="my-ws")
+
+# Load observables from the workspace
+ws = manager.open("my-ws")
+observables = list(ws.objects.values())
+
+results = engine.prioritize(observables, context=ctx, store_notes=True)
+
+for observable, score, explanation in results:
+    print(f"{score:.2f}  {observable.id}")
+    print(f"  {explanation['summary']}")
+```
+
+### Read the structured explanation
+
+The `explanation` dict is machine-readable:
+
+```python
+_, score, explanation = results[0]
+
+print(explanation["observable_id"])    # STIX ID
+print(explanation["score"])            # 0.0 – 1.0
+
+trust_info = explanation["components"]["trust_weight"]
+print(trust_info["trust_level"])       # "semi_trusted"
+print(trust_info["weight"])            # 0.6
+
+age = explanation["components"]["age_factor"]
+print(f"age factor: {age:.2f}")        # 0.85 (17 days old at 5%/day decay)
+
+neg = explanation["components"]["negative_evidence"]
+print(f"{neg['count']} fresh neg records, penalty={neg['penalty']:.2f}")
+
+corr = explanation["components"]["corroboration"]
+print(f"{corr['hits']} Solr hits, bonus={corr['bonus']:.2f}")
+```
+
+### Stored STIX notes
+
+When `store_notes=True` (default), the engine writes a STIX `note` object to the
+workspace for each scored observable.  The note contains the full JSON explanation so
+analysts can review it later.
+
+---
+
+## Propose and Evaluate Hypotheses
+
+```python
+from gnat.reasoning.hypothesis import HypothesisEngine
+from gnat.context.workspace import WorkspaceManager
+
+manager = WorkspaceManager.default()
+engine = HypothesisEngine(manager=manager, workspace_name="apt29-investigation")
+
+# 1. Propose a hypothesis
+h = engine.propose(
+    statement="192.0.2.1 is a Lazarus Group C2 server.",
+    initial_evidence=["relationship--abc123"],  # STIX relationship IDs
+    confidence=0.2,   # low initial confidence
+)
+print(h._properties["status"])      # "pending"
+print(h._properties["confidence"])  # 0.2
+
+# 2. Evaluate — queries Solr for corroborating evidence
+h = engine.evaluate(h.id)
+print(h._properties["confidence"])  # updated based on evidence + Solr hits
+print(h._properties["status"])      # "pending" | "confirmed" | "refuted"
+
+# 3. Add more evidence manually
+h.add_supporting_evidence("relationship--def456")
+h.add_refuting_evidence("relationship--ghi789")
+
+# 4. Close with a verdict
+h = engine.close(h.id, verdict="confirmed")
+print(h._properties["status"])  # "confirmed"
+```
+
+### List all hypotheses
+
+```python
+all_hypotheses = engine.list_all()
+for h in all_hypotheses:
+    print(h._properties["statement"][:60], "→", h._properties["status"])
+```
+
+---
+
+## Track Negative Evidence
+
+`NegativeEvidenceRecord` suppresses redundant connector re-queries within a configurable TTL.
+
+```python
+from gnat.stix.sdos.negative_evidence import NegativeEvidenceRecord
+from gnat.context.workspace import WorkspaceManager
+
+manager = WorkspaceManager.default()
+ws = manager.open("my-ws")
+
+indicator_id = "indicator--abc123"
+
+# Check for a fresh negative record before querying
+neg_records = [
+    obj for obj in ws.objects.values()
+    if getattr(obj, "stix_type", "") == "x-gnat-negative-evidence"
+    and obj._properties.get("target_ref") == indicator_id
+    and not obj.is_expired()   # within TTL
+]
+
+if neg_records:
+    print("Skipping re-query — connector returned no results within TTL")
+else:
+    # Query connector
+    result = vt_client.get(f"/api/v3/files/{indicator_id}")
+
+    if not result:
+        # Write negative evidence record
+        rec = NegativeEvidenceRecord(
+            target_ref=indicator_id,
+            queried_connector="VirusTotalClient",
+            ttl_seconds=3600,  # suppress re-queries for 1 hour
+        )
+        ws._add_object(rec.to_dict(), mark_dirty=True)
+```
+
+Check TTL status:
+
+```python
+rec = NegativeEvidenceRecord(target_ref="indicator--abc", queried_connector="VT", ttl_seconds=3600)
+print(rec.is_expired())          # False immediately after creation
+print(rec.seconds_remaining())   # ~3600
+```
+
+---
+
+## Attach Solr for Corroboration
+
+When Solr is running, the reasoning engine uses it for cross-connector corroboration.
+Configure via `[search]` in `~/.gnat/config.ini`:
+
+```ini
+[search]
+solr_url   = http://localhost:8983/solr/gnat
+enabled    = true
+batch_size = 100
+```
+
+Then pass it explicitly:
+
+```python
+from gnat.search.index import SolrSearchIndex, SolrSearchConfig
+from gnat.reasoning.engine import ReasoningEngine
+
+config = SolrSearchConfig(solr_url="http://localhost:8983/solr/gnat")
+index = SolrSearchIndex(config)
+
+engine = ReasoningEngine(
+    manager=manager,
+    workspace_name="my-ws",
+    search_index=index,
+)
+```
+
+Without Solr, the engine falls back to `NullSearchIndex` — all scores work but
+the corroboration bonus is always 0.0.
+
+---
+
+## See Also
+
+- [ADR-0042 — Hypothesis Engine](../explanation/architecture/adrs/0042-ADR-hypothesis-engine.md)
+- [ADR-0043 — Negative Evidence](../explanation/architecture/adrs/0043-ADR-negative-evidence.md)
+- [ADR-0044 — Reasoning Engine](../explanation/architecture/adrs/0044-ADR-reasoning-engine.md)
+- [How-to: Use Workspaces](use-workspaces.md)
+- [How-to: Build Investigations](build-investigations.md)
+
+---
+
+*Licensed under the Apache License, Version 2.0*
diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md
index 360c67c3..ccded2a1 100644
--- a/docs/reference/configuration.md
+++ b/docs/reference/configuration.md
@@ -133,6 +133,97 @@ default_tlp  = amber
 auto_approve = false
 ```
 
+### `[agent_policy]`
+
+Controls the `AgentGovernor` permission matrix and rate limits (Phase 4D).
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `max_calls_per_window` | int | `100` | Maximum connector calls an agent may make within `window_seconds` |
+| `window_seconds` | int | `60` | Sliding-window size for rate limiting |
+| `approval_timeout_seconds` | int | `3600` | Seconds before a pending HITL review is auto-rejected |
+| `default_impact_level` | str | `"low"` | Assumed impact level for actions that don't specify one (`low`/`medium`/`high`/`critical`) |
+
+```ini
+[agent_policy]
+max_calls_per_window     = 100
+window_seconds           = 60
+approval_timeout_seconds = 3600
+default_impact_level     = low
+```
+
+Per-agent permission overrides use the pattern `{agent_id}.{action_type}`:
+
+```ini
+[agent_policy]
+; Allow research-agent-1 to trigger SOAR playbooks despite semi_trusted level
+research-agent-1.trigger_playbook = true
+
+; Deny threat-hunter-2 from deleting STIX objects even if trust level permits
+threat-hunter-2.delete_stix = false
+```
+
+### `[connector_limits]`
+
+Per-connector rate limits and cost overrides (Phase 4E).
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `{connector}.cost_unit` | int | per-class `COST_UNIT` | Override the cost-per-request for a named connector |
+| `{connector}.max_calls_per_minute` | int | unlimited | Hard ceiling on calls per minute for a specific connector |
+
+```ini
+[connector_limits]
+; VirusTotal has strict rate limits on the free tier
+virustotal.cost_unit           = 5
+virustotal.max_calls_per_minute = 4
+
+; Splunk bulk exports are expensive
+splunk.cost_unit = 10
+
+; RecordedFuture lookups count as standard
+recordedfuture.cost_unit = 1
+```
+
+### `[workspace_defaults]`
+
+Default isolation settings applied to newly created workspaces (Phase 4E).
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `trust_boundary` | str | `"semi_trusted"` | Minimum connector `TRUST_LEVEL` required for workspace access |
+| `allowed_connector_refs` | str | `""` (all) | Comma-separated connector class names that may access this workspace; empty = no restriction |
+
+```ini
+[workspace_defaults]
+trust_boundary         = semi_trusted
+; Leave allowed_connector_refs empty to permit all connectors that meet trust_boundary
+allowed_connector_refs =
+```
+
+To lock a workspace to only internal connectors:
+
+```ini
+[workspace_defaults]
+trust_boundary         = trusted_internal
+allowed_connector_refs = SplunkClient, SentinelClient, ElasticClient
+```
+
+### `[execution_context]`
+
+Controls default `ExecutionContext` parameters (Phase 4A).
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `default_policy_set` | str | `"default"` | Policy set name written to every `execution_log` row |
+| `default_budget_units` | int | `0` (unlimited) | Max query budget units per context; 0 = no budget enforced |
+
+```ini
+[execution_context]
+default_policy_set   = default
+default_budget_units = 0
+```
+
 ### Platform sections
 
 Each platform connector reads its own INI section.
@@ -144,6 +235,9 @@ See `config/config.ini.example` for the full list of connector keys.
 
 - [How-to: Connect to Platforms](../how-to/connect-to-platforms.md)
 - [How-to: Use the Analysis Layer](../how-to/use-analysis-layer.md)
+- [How-to: Use Execution Context](../how-to/use-execution-context.md)
+- [How-to: Use the Reasoning Engine](../how-to/use-reasoning-engine.md)
+- [How-to: Agent Governance](../how-to/agent-governance.md)
 - [How-to: Create Intelligence Reports](../how-to/create-intelligence-reports.md)
 - `config/config.ini.example`
 
diff --git a/docs/sphinx-html/source/agents_governance.rst b/docs/sphinx-html/source/agents_governance.rst
new file mode 100644
index 00000000..5bad0035
--- /dev/null
+++ b/docs/sphinx-html/source/agents_governance.rst
@@ -0,0 +1,160 @@
+Agent Governance
+================
+
+Phase 4D introduces a governance layer that controls, audits, and rate-limits every
+AI agent action.  High-impact actions require human approval before execution.
+
+.. contents:: On this page
+   :local:
+   :depth: 2
+
+Overview
+--------
+
+The governance layer has two components:
+
+* :class:`~gnat.agents.governor.AgentGovernor` — checks permissions against a
+  trust-level matrix, enforces per-agent rate limits, and maintains an audit log of
+  all agent actions.
+* :class:`~gnat.agents.hitl.HITLGateway` — bridges ``AgentGovernor`` to the existing
+  :class:`~gnat.review.service.ReviewService`; low/medium-impact actions are
+  auto-approved, high-impact actions block until a human reviewer approves, and
+  critical actions also trigger XSOAR notifications.
+
+Quick Start
+-----------
+
+.. code-block:: python
+
+   from gnat.agents.governor import AgentGovernor, AgentAction
+   from gnat.agents.hitl import HITLGateway
+   from gnat.policy.models import AgentActionType
+   from gnat.review.service import ReviewService
+   from gnat.review.store import ReviewQueueStore
+
+   # Set up
+   governor = AgentGovernor(max_calls_per_window=100, window_seconds=60)
+   store = ReviewQueueStore(db_url="sqlite:///~/.gnat/gnat.db")
+   store.create_all()
+   gateway = HITLGateway(review_service=ReviewService(store=store))
+
+   # Check permission
+   if governor.can_act("agent-1", AgentActionType.ENRICH, "semi_trusted"):
+       governor.rate_limit_check("agent-1")
+
+       action = AgentAction(
+           agent_id="agent-1",
+           action_type=AgentActionType.ENRICH,
+           target_ref="indicator--abc",
+           impact_level="low",
+       )
+       approved, review_item = gateway.evaluate(action)
+       governor.record_action(action)
+
+Permission Matrix
+-----------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 70
+
+   * - Trust Level
+     - Permitted Actions
+   * - ``trusted_internal``
+     - All actions (read_stix, write_stix, delete_stix, enrich, ingest, export,
+       trigger_playbook, manage_workspace, escalate, hypothesize)
+   * - ``semi_trusted``
+     - read_stix, write_stix, enrich, ingest, hypothesize, escalate
+   * - ``untrusted_external``
+     - read_stix, enrich, hypothesize
+
+Impact Tiers
+------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 15 85
+
+   * - Level
+     - Behaviour
+   * - ``low``
+     - Auto-approved, logged only
+   * - ``medium``
+     - Auto-approved, logged only
+   * - ``high``
+     - Submitted to ``ReviewService`` as PENDING; blocks until approved/rejected/timed-out
+   * - ``critical``
+     - PENDING + XSOAR notification via ``XSOARClient.upsert_object()``
+
+API Reference
+-------------
+
+AgentGovernor
+~~~~~~~~~~~~~
+
+.. autoclass:: gnat.agents.governor.AgentGovernor
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+AgentAction
+~~~~~~~~~~~
+
+.. autoclass:: gnat.agents.governor.AgentAction
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+HITLGateway
+~~~~~~~~~~~
+
+.. autoclass:: gnat.agents.hitl.HITLGateway
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+AgentActionType
+~~~~~~~~~~~~~~~
+
+.. autoclass:: gnat.policy.models.AgentActionType
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Exceptions
+~~~~~~~~~~
+
+.. autoclass:: gnat.agents.governor.AgentPermissionDenied
+   :show-inheritance:
+
+.. autoclass:: gnat.agents.governor.RateLimitExceeded
+   :show-inheritance:
+
+Testing
+-------
+
+Use :class:`~gnat.testing.simulation.AgentTestHarness` for deterministic agent tests:
+
+.. code-block:: python
+
+   from gnat.testing import AgentTestHarness
+   from gnat.policy.models import AgentActionType
+
+   harness = AgentTestHarness()
+   approved, action = harness.run_action(
+       agent_id="test-agent",
+       action_type=AgentActionType.ENRICH,
+       impact_level="low",
+       trust_level="semi_trusted",
+   )
+   assert approved is True
+   assert len(harness.recorded_actions) == 1
+
+See Also
+--------
+
+* :doc:`/api/core` — ExecutionContext
+* :doc:`/reasoning` — Hypothesis and reasoning engine
+* ADR-0045: Agent Governance
+* ADR-0046: HITL Gateway
+* ADR-0049: Testing Framework
diff --git a/docs/sphinx-html/source/api/agents_governance.rst b/docs/sphinx-html/source/api/agents_governance.rst
new file mode 100644
index 00000000..9d2a2516
--- /dev/null
+++ b/docs/sphinx-html/source/api/agents_governance.rst
@@ -0,0 +1,43 @@
+gnat.agents — Governance & HITL
+================================
+
+.. automodule:: gnat.agents.governor
+   :members:
+   :undoc-members:
+
+.. automodule:: gnat.agents.hitl
+   :members:
+   :undoc-members:
+
+gnat.policy — Permission Models
+--------------------------------
+
+.. automodule:: gnat.policy
+   :members:
+   :undoc-members:
+
+.. autoclass:: gnat.policy.models.AgentActionType
+   :members:
+   :undoc-members:
+
+.. autofunction:: gnat.policy.models.agent_can_act
+
+gnat.testing — Simulation Framework
+-------------------------------------
+
+.. automodule:: gnat.testing
+   :members:
+   :undoc-members:
+
+.. autoclass:: gnat.testing.simulation.SimulationConnector
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+.. autoclass:: gnat.testing.simulation.ReplayRunner
+   :members:
+   :undoc-members:
+
+.. autoclass:: gnat.testing.simulation.AgentTestHarness
+   :members:
+   :undoc-members:
diff --git a/docs/sphinx-html/source/api/core.rst b/docs/sphinx-html/source/api/core.rst
new file mode 100644
index 00000000..88d258c5
--- /dev/null
+++ b/docs/sphinx-html/source/api/core.rst
@@ -0,0 +1,46 @@
+gnat.core — Execution Context & Domain Boundaries
+===================================================
+
+Phase 4A cross-cutting infrastructure: execution tracing, domain boundary
+enforcement, connector trust, and query budget management.
+
+.. automodule:: gnat.core
+   :members:
+   :undoc-members:
+
+ExecutionContext
+---------------
+
+.. autoclass:: gnat.core.context.ExecutionContext
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+QueryBudget
+-----------
+
+.. autoclass:: gnat.core.context.QueryBudget
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Domain Boundary Enforcement
+---------------------------
+
+.. automodule:: gnat.core.domains
+   :members:
+   :undoc-members:
+
+.. autoclass:: gnat.core.domains.Domain
+   :members:
+   :undoc-members:
+
+.. autoclass:: gnat.core.domains.DomainBoundaryViolation
+   :show-inheritance:
+
+.. autoclass:: gnat.core.domains.TrustLevelViolation
+   :show-inheritance:
+
+.. autofunction:: gnat.core.domains.domain_boundary
+
+.. autofunction:: gnat.core.domains.require_trust_level
diff --git a/docs/sphinx-html/source/api/reasoning.rst b/docs/sphinx-html/source/api/reasoning.rst
new file mode 100644
index 00000000..ba1356df
--- /dev/null
+++ b/docs/sphinx-html/source/api/reasoning.rst
@@ -0,0 +1,35 @@
+gnat.reasoning — Hypothesis & Reasoning Engine
+===============================================
+
+.. automodule:: gnat.reasoning
+   :members:
+   :undoc-members:
+
+ReasoningEngine
+---------------
+
+.. autoclass:: gnat.reasoning.engine.ReasoningEngine
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+HypothesisEngine
+----------------
+
+.. autoclass:: gnat.reasoning.hypothesis.HypothesisEngine
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Custom STIX SDOs
+----------------
+
+.. autoclass:: gnat.stix.sdos.hypothesis.STIXHypothesis
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+.. autoclass:: gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/sphinx-html/source/index.rst b/docs/sphinx-html/source/index.rst
index 55df08fc..208e6abd 100644
--- a/docs/sphinx-html/source/index.rst
+++ b/docs/sphinx-html/source/index.rst
@@ -23,6 +23,8 @@ and STIX 2.1-compatible ORM for security platforms.
    cli
    codegen
    contexts
+   reasoning
+   agents_governance
 
 .. toctree::
    :maxdepth: 3
@@ -36,6 +38,9 @@ and STIX 2.1-compatible ORM for security platforms.
    api/connectors
    api/cli
    api/codegen
+   api/core
+   api/reasoning
+   api/agents_governance
    api/utils
 
 .. toctree::
diff --git a/docs/sphinx-html/source/reasoning.rst b/docs/sphinx-html/source/reasoning.rst
new file mode 100644
index 00000000..1aa27543
--- /dev/null
+++ b/docs/sphinx-html/source/reasoning.rst
@@ -0,0 +1,117 @@
+Reasoning Layer
+===============
+
+Phase 4C introduces a structured reasoning layer for observable prioritisation and
+hypothesis lifecycle management.
+
+.. contents:: On this page
+   :local:
+   :depth: 2
+
+Overview
+--------
+
+The reasoning layer consists of three interconnected components:
+
+* :class:`~gnat.reasoning.engine.ReasoningEngine` — scores and ranks STIX observables
+  using a composite of connector trust, object age, Solr corroboration, and negative
+  evidence signals.
+* :class:`~gnat.reasoning.hypothesis.HypothesisEngine` — manages the
+  ``propose → evaluate → close`` lifecycle for analyst hypotheses stored as custom
+  STIX SDOs.
+* :class:`~gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord` — suppresses
+  redundant connector re-queries within a configurable TTL window.
+
+Quick Start
+-----------
+
+.. code-block:: python
+
+   from gnat.reasoning.engine import ReasoningEngine
+   from gnat.reasoning.hypothesis import HypothesisEngine
+   from gnat.core.context import ExecutionContext
+   from gnat.context.workspace import WorkspaceManager
+
+   manager = WorkspaceManager.default()
+   ctx = ExecutionContext.create(
+       initiated_by="analyst",
+       domain="analysis",
+       workspace_id="my-ws",
+   )
+
+   # Score observables
+   engine = ReasoningEngine(manager=manager, workspace_name="my-ws")
+   ws = manager.open("my-ws")
+   results = engine.prioritize(list(ws.objects.values()), context=ctx)
+   for obs, score, explanation in results:
+       print(f"{score:.2f}  {explanation['summary']}")
+
+   # Propose hypothesis
+   h_engine = HypothesisEngine(manager=manager, workspace_name="my-ws")
+   h = h_engine.propose("APT29 behind Q1 campaign", confidence=0.2)
+   h = h_engine.evaluate(h.id)
+   h = h_engine.close(h.id, verdict="confirmed")
+
+API Reference
+-------------
+
+ReasoningEngine
+~~~~~~~~~~~~~~~
+
+.. autoclass:: gnat.reasoning.engine.ReasoningEngine
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+HypothesisEngine
+~~~~~~~~~~~~~~~~
+
+.. autoclass:: gnat.reasoning.hypothesis.HypothesisEngine
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+STIXHypothesis SDO
+~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: gnat.stix.sdos.hypothesis.STIXHypothesis
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+NegativeEvidenceRecord SDO
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: gnat.stix.sdos.negative_evidence.NegativeEvidenceRecord
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Scoring Formula
+---------------
+
+The composite score is computed as:
+
+.. code-block:: text
+
+   score = trust_weight × 0.4
+         + age_factor    × 0.3
+         + corroboration × 0.3
+         - neg_penalty   × 0.5
+
+   clamped to [0.0, 1.0]
+
+Where:
+
+* **trust_weight** — ``trusted_internal``→0.9, ``semi_trusted``→0.6, ``untrusted_external``→0.3
+* **age_factor** — 1.0 decaying by 5% per day from ``modified`` timestamp (floor 0.0)
+* **corroboration** — Solr hit count × 0.05, capped at 0.25
+* **neg_penalty** — min(0.3 × fresh NegativeEvidenceRecord count, 0.6)
+
+See Also
+--------
+
+* :doc:`/api/core` — ExecutionContext and QueryBudget
+* ADR-0042: Hypothesis Engine
+* ADR-0043: Negative Evidence
+* ADR-0044: Reasoning Engine