|
| 1 | +# GNAT — Cross-Tool Investigation Context Plan |
| 2 | + |
| 3 | +**Scope:** this is GNAT’s side of the GNAT-o-sphere investigation-context work. It assumes SandGNAT, SenseGNAT, and RedGNAT will each ship a matching plan of their own. This document is the source of truth for the shared contract; the addon plans reference it. |
| 4 | + |
| 5 | +**Intended audience:** Claude Code working in the `wrhalpin/GNAT` repo. |
| 6 | + |
| 7 | +----- |
| 8 | + |
| 9 | +## Context that must not be re-derived |
| 10 | + |
| 11 | +GNAT already has: |
| 12 | + |
| 13 | +- `gnat/analysis/investigations/` — `Investigation`, `Hypothesis`, `AnalystNote`, `InvestigationTask`, state machine (`OPEN → IN_PROGRESS → REVIEW → CLOSED`), `InvestigationService`, `InvestigationStore` (SQLAlchemy). |
| 14 | +- `gnat/investigations/` — cross-platform evidence-graph builder. `EvidenceGraph`, `EvidenceNode`, `EvidenceEdge`, `Seed`, `SeedType`, five-step pipeline (`seed → incident expansion → normalise → correlate → materialise`). |
| 15 | +- `gnat/analysis/correlation/` — `EntityResolver`, `RelationshipScorer`, `ClusterDetector`, `EnrichmentDispatcher`. |
| 16 | +- `gnat/analysis/timeline.py`, `gnat/analysis/graph.py`, `gnat/analysis/copilot/gap_detector.py`, `gnat/analysis/copilot/drafting.py`. |
| 17 | +- `gnat/reporting/` — `Report`, `Finding`, `EvidenceLink`, `Attribution`, five-state lifecycle, STIX 2.1 report SDO export. |
| 18 | +- Admiralty Scale confidence scoring, TLP 2.0, AI confidence ceiling of 60. |
| 19 | +- `TenantRegistry` and `WorkspaceManager` for multi-tenant isolation. |
| 20 | + |
| 21 | +**Do not build a second investigation model.** The work in this plan extends the existing `gnat.analysis.investigations.Investigation` — it does not replace, shadow, or parallel it. |
| 22 | + |
| 23 | +If any of the above has changed since this plan was written, confirm the current state in-conversation before proceeding. Do not assume this plan is current. |
| 24 | + |
| 25 | +----- |
| 26 | + |
| 27 | +## Goal |
| 28 | + |
| 29 | +Let SandGNAT, SenseGNAT, and RedGNAT attach their outputs to GNAT investigations without coupling them to GNAT’s internals. |
| 30 | + |
| 31 | +----- |
| 32 | + |
| 33 | +## The shared contract (source of truth) |
| 34 | + |
| 35 | +Three custom STIX properties on any object an addon emits: |
| 36 | + |
| 37 | +|Property |Required |Purpose | |
| 38 | +|--------------------------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 39 | +|`x_gnat_investigation_id` |yes, when an investigation is known|Primary key of the `Investigation` row in GNAT. String. Must match an existing investigation ID; addons never mint new IDs. | |
| 40 | +|`x_gnat_investigation_origin` |yes |One of `"sandgnat"`, `"sensegnat"`, `"redgnat"`, `"gnat"`, `"external"`. Tells the receiver which addon produced the object so the evidence graph can label the node. | |
| 41 | +|`x_gnat_investigation_link_type`|no, defaults to `"inferred"` |One of `"confirmed"` (addon is certain this belongs to the investigation — e.g. RedGNAT emulated a specific hypothesis), `"inferred"` (correlation logic linked it), `"suggested"` (proposed, pending analyst acceptance).| |
| 42 | + |
| 43 | +Addons should also wrap their per-run output in a STIX `Grouping` with the same three properties set on the Grouping itself. The Grouping’s `object_refs` lists the objects emitted in that run. Consumers can then either consume the Grouping as a single evidence bundle or iterate the individual objects. |
| 44 | + |
| 45 | +Confidence scoring rules (from GNAT policy, not changed by this plan): |
| 46 | + |
| 47 | +- Any object with `x_source_type = "ai_extracted"` is capped at confidence 60. Addons must respect this. |
| 48 | +- Correlation-inferred links carry a separate `x_gnat_correlation_confidence` (0–100) independent of the object’s own confidence; GNAT assigns this at receive time. |
| 49 | + |
| 50 | +**Investigation IDs are tenant-scoped.** The receiver validates that every stamped `x_gnat_investigation_id` belongs to an investigation in the tenant the incoming request is authenticated for. Cross-tenant references are rejected. |
| 51 | + |
| 52 | +A formal JSON schema for these three properties and the Grouping envelope lives at `docs/reference/investigation-context-schema.md` (new — see phase 0). |
| 53 | + |
| 54 | +----- |
| 55 | + |
| 56 | +## Phase 0 — Docs and contract |
| 57 | + |
| 58 | +Three documents, no code yet. These are the artifacts the three addon plans reference; lock them down before anyone starts coding. |
| 59 | + |
| 60 | +### 0.1 ADR |
| 61 | + |
| 62 | +Path: `docs/architecture/adrs/ADR-00XX-gnat-investigation-context.md` (pick the next available number). |
| 63 | + |
| 64 | +Decisions to capture: |
| 65 | + |
| 66 | +- Adopt `x_gnat_investigation_id`, `x_gnat_investigation_origin`, `x_gnat_investigation_link_type` as the shared cross-tool correlation contract. |
| 67 | +- These properties are custom STIX properties stamped on individual objects **and** on a wrapping `Grouping` per addon run. |
| 68 | +- Investigation identity is owned by GNAT’s existing `gnat.analysis.investigations.Investigation`. Addons never create investigations. |
| 69 | +- Addons are never required to stamp the properties — objects without them are ingested normally. The properties are additive metadata, not a hard dependency. |
| 70 | +- The receive path accepts externally-stamped STIX via existing TAXII 2.1 ingest; no new protocol is introduced. |
| 71 | +- Cross-tenant investigation references are rejected at ingest. |
| 72 | + |
| 73 | +### 0.2 Reference schema |
| 74 | + |
| 75 | +Path: `docs/reference/investigation-context-schema.md`. |
| 76 | + |
| 77 | +Contents: exact JSON schema for the three properties, constraints, examples of a single stamped `Indicator`, examples of a `Grouping` wrapping a SandGNAT detonation bundle, error cases (bad tenant, unknown investigation, malformed origin value). |
| 78 | + |
| 79 | +### 0.3 Explanation doc |
| 80 | + |
| 81 | +Path: `docs/explanation/cross-tool-investigation-model.md`. |
| 82 | + |
| 83 | +Contents: why the model exists, how each addon participates, the relationship between this model and the existing `Investigation`, `EvidenceGraph`, and `Report` primitives. One end-to-end diagram showing a seed IOC → SandGNAT detonation → SenseGNAT correlation → RedGNAT validation → GNAT report, with the investigation_id threaded through every step. |
| 84 | + |
| 85 | +----- |
| 86 | + |
| 87 | +## Phase 1 — Investigation API surface (thin additions) |
| 88 | + |
| 89 | +The existing `InvestigationService` has CRUD. Addons need a small, purposeful surface on top of it. Nothing here adds new models. |
| 90 | + |
| 91 | +### 1.1 Addon-facing REST endpoints |
| 92 | + |
| 93 | +Mount under the existing gateway router (`gnat/dissemination/api/`). All endpoints require the same `X-Api-Key` and tenant header the gateway already uses. |
| 94 | + |
| 95 | +|Method|Path |Purpose | |
| 96 | +|------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 97 | +|`GET` |`/api/investigations` |List investigations visible to the authenticated tenant. Supports `status`, `created_since`, `tag`, pagination. | |
| 98 | +|`GET` |`/api/investigations/{id}` |Fetch a single investigation with its hypotheses and linked object counts. | |
| 99 | +|`GET` |`/api/investigations/{id}/hypotheses`|List hypotheses for the investigation. RedGNAT uses this to pick a hypothesis to emulate. | |
| 100 | +|`POST`|`/api/investigations/{id}/evidence` |Accept a STIX bundle or Grouping stamped with this investigation’s ID. Validates tenant, validates investigation exists and is not `CLOSED`, validates all contained objects carry a matching `x_gnat_investigation_id`, then routes into existing ingest.| |
| 101 | + |
| 102 | +The `POST .../evidence` endpoint is the one new ingest path. Implementation should delegate to the existing ingest pipeline rather than reimplementing STIX validation. |
| 103 | + |
| 104 | +### 1.2 InvestigationService method additions |
| 105 | + |
| 106 | +In `gnat/analysis/investigations/service.py`: |
| 107 | + |
| 108 | +- `attach_evidence_bundle(investigation_id, bundle, origin, tenant_id) -> AttachResult` — validates, routes to ingest, and returns a structured result (`accepted_count`, `rejected_count`, `rejection_reasons`). |
| 109 | +- `find_by_subject(subject_ref, tenant_id) -> list[Investigation]` — returns investigations whose EvidenceGraph already contains `subject_ref`. SenseGNAT calls this at detector-emission time to auto-tag findings. |
| 110 | + |
| 111 | +Both methods are thin — they orchestrate existing components. |
| 112 | + |
| 113 | +### 1.3 Closed-investigation policy |
| 114 | + |
| 115 | +When `POST /api/investigations/{id}/evidence` targets a `CLOSED` investigation: |
| 116 | + |
| 117 | +- Reject with `409 Conflict` by default. |
| 118 | +- Include an `X-Reopen-Investigation` header that an authorised caller can set to auto-reopen (move state back to `IN_PROGRESS`, log an `AnalystNote` recording why). |
| 119 | + |
| 120 | +This keeps closed investigations stable while giving an explicit path to add late-arriving evidence. |
| 121 | + |
| 122 | +----- |
| 123 | + |
| 124 | +## Phase 2 — Evidence graph integration |
| 125 | + |
| 126 | +Addon outputs must show up as nodes in the existing `EvidenceGraph`, correctly labeled by origin. |
| 127 | + |
| 128 | +### 2.1 Normalizer pass-through |
| 129 | + |
| 130 | +In `gnat/investigations/normalizer.py`: |
| 131 | + |
| 132 | +- When a raw platform record carries `x_gnat_investigation_id`, `x_gnat_investigation_origin`, or `x_gnat_investigation_link_type`, those three fields must be preserved on the resulting `EvidenceNode` as node metadata. |
| 133 | +- Add a new `origin` field on `EvidenceNode` (default `"gnat"`). It’s the source-of-truth label for graph views and report grouping. |
| 134 | + |
| 135 | +### 2.2 Correlator behaviour |
| 136 | + |
| 137 | +In `gnat/investigations/correlator.py`: |
| 138 | + |
| 139 | +- Addon-sourced nodes must participate in correlation the same way as connector-sourced nodes. |
| 140 | +- New edges that connect an addon-sourced node to another node get `link_type="inferred"` by default, unless the addon explicitly marked a link as `"confirmed"` (RedGNAT emulation against a specific hypothesis is the canonical case). |
| 141 | + |
| 142 | +### 2.3 Graph query surface |
| 143 | + |
| 144 | +In `gnat/analysis/graph.py`: |
| 145 | + |
| 146 | +- Add `filter_by_origin(origin_list)` to `GraphQuery`. Analysts filtering a view to “show me only SenseGNAT-sourced nodes in this investigation” must work. |
| 147 | + |
| 148 | +----- |
| 149 | + |
| 150 | +## Phase 3 — Cross-tool report template |
| 151 | + |
| 152 | +One new report template, nothing more. `gnat.reporting` already does the heavy lifting. |
| 153 | + |
| 154 | +Path: `gnat/reports/templates/cross_tool_investigation.py`. |
| 155 | + |
| 156 | +The template pulls, for a given `investigation_id`: |
| 157 | + |
| 158 | +- Investigation header (title, status, hypotheses, analyst notes). |
| 159 | +- Timeline from `TimelineBuilder` filtered by investigation_id. |
| 160 | +- Sections grouped by `origin`: |
| 161 | + - **SandGNAT findings** — malware analyses, artifacts, extracted indicators, similarity neighbours. |
| 162 | + - **SenseGNAT findings** — behavioural detections with narrative strings intact. |
| 163 | + - **RedGNAT findings** — emulation runs, techniques exercised, detection gaps. |
| 164 | + - **GNAT / external** — everything else. |
| 165 | +- Confidence and attribution summary from existing `Attribution` and `ConfidenceScore`. |
| 166 | +- Recommendations section drafted by `ReportDraftingAssistant` (already exists; confidence-ceiling rules already apply). |
| 167 | +- Appendix: raw STIX references. |
| 168 | + |
| 169 | +Expose as `gnat report run --template cross_tool_investigation --investigation IC-2026-0001 --formats pdf html`. |
| 170 | + |
| 171 | +----- |
| 172 | + |
| 173 | +## Phase 4 — CLI additions |
| 174 | + |
| 175 | +In `gnat/investigations/cli.py` (this CLI is light; don’t build a parallel CLI for `gnat.analysis.investigations`): |
| 176 | + |
| 177 | +``` |
| 178 | +gnat investigation list [--tenant X] [--status open] |
| 179 | +gnat investigation show <id> |
| 180 | +gnat investigation evidence <id> # list linked objects grouped by origin |
| 181 | +gnat investigation graph <id> [--origin sandgnat,sensegnat] |
| 182 | +gnat investigation export <id> # STIX bundle, preserves custom properties |
| 183 | +gnat investigation report <id> --format pdf |
| 184 | +``` |
| 185 | + |
| 186 | +No `create` / `link` commands in this plan — creation and link management already have a surface in `InvestigationService` and the existing analyst UI. Keep this CLI focused on the cross-tool read path. |
| 187 | + |
| 188 | +----- |
| 189 | + |
| 190 | +## Phase 5 — Tests |
| 191 | + |
| 192 | +### Unit |
| 193 | + |
| 194 | +- `tests/unit/investigations/test_evidence_api.py` — the POST endpoint: accepts a valid stamped bundle, rejects mismatched investigation_id, rejects cross-tenant, rejects closed investigation without reopen header, accepts with reopen header. |
| 195 | +- `tests/unit/investigations/test_normalizer_passthrough.py` — all three custom properties survive the normalize step and land on `EvidenceNode`. |
| 196 | +- `tests/unit/investigations/test_service_additions.py` — `attach_evidence_bundle` and `find_by_subject`. |
| 197 | +- `tests/unit/reports/test_cross_tool_template.py` — template renders with fixture data from all four origins. |
| 198 | + |
| 199 | +### Integration |
| 200 | + |
| 201 | +- `tests/integration/test_cross_tool_ingest.py` — spin up a tenant, create an investigation, POST a fixture bundle labelled each origin, verify evidence graph contains nodes labelled correctly. |
| 202 | + |
| 203 | +Hit 70% coverage minimum (existing gate). Don’t lower it. |
| 204 | + |
| 205 | +----- |
| 206 | + |
| 207 | +## Out of scope |
| 208 | + |
| 209 | +- A new investigation model. The existing one stays. |
| 210 | +- A new evidence graph. Same. |
| 211 | +- A new correlation engine. Same. |
| 212 | +- “Investigation graph view UI” as a new feature — the existing TUI and web dashboard render `EvidenceGraph` today. The only UI work in this plan is the origin filter (Phase 2.3). |
| 213 | +- STIX `Incident` object. Out of scope for this pass; `Grouping` covers the immediate need. |
| 214 | + |
| 215 | +----- |
| 216 | + |
| 217 | +## Acceptance criteria |
| 218 | + |
| 219 | +1. An analyst creates an investigation in GNAT (existing flow, unchanged). |
| 220 | +1. SandGNAT, SenseGNAT, and RedGNAT can each POST stamped STIX bundles to `/api/investigations/{id}/evidence` and the objects land in the investigation’s `EvidenceGraph` with correct `origin` labels. |
| 221 | +1. `gnat investigation graph <id> --origin sensegnat` returns only SenseGNAT-sourced nodes. |
| 222 | +1. `gnat investigation report <id> --format pdf` produces a report with sections grouped by origin. |
| 223 | +1. STIX export of the investigation preserves all three custom properties on every object. |
| 224 | +1. Existing standalone ingest paths (TAXII, ingest pipelines) still work without any `x_gnat_investigation_id` present. |
| 225 | +1. Cross-tenant investigation references are rejected with a clear error. |
| 226 | +1. Closed-investigation evidence POSTs are rejected unless `X-Reopen-Investigation` is set. |
| 227 | + |
| 228 | +----- |
| 229 | + |
| 230 | +## Risks |
| 231 | + |
| 232 | +|Risk |Mitigation | |
| 233 | +|-------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 234 | +|Parallel investigation models appear because a future planner doesn’t see the existing one.|Keep this plan and its ADR in `docs/architecture/`. Any future plan that proposes a new investigation model must reference this ADR and justify deviation.| |
| 235 | +|Cross-tenant ID leakage. |Tenant validation on every endpoint touching investigation_id. Integration test covers it. | |
| 236 | +|Addons silently drop the custom properties through a STIX round-trip. |Normalizer pass-through test (Phase 5). Run it in addon CI too. | |
| 237 | +|Closed-investigation policy causes data loss. |The `X-Reopen-Investigation` escape hatch with `AnalystNote` audit trail. | |
| 238 | +|AI-generated “suggested” links pollute the graph. |Existing `confidence_ceiling = 60` and the `"suggested"` link_type keep them filterable. Default view hides `"suggested"` unless opted in. | |
| 239 | + |
| 240 | +----- |
| 241 | + |
| 242 | +## Handoff checklist before starting code |
| 243 | + |
| 244 | +- [ ] Phase 0 docs written and reviewed. |
| 245 | +- [ ] The three custom property names match exactly what the addon plans reference. |
| 246 | +- [ ] The REST endpoints in Phase 1.1 are reviewed against the existing gateway router so we don’t create a parallel auth surface. |
| 247 | +- [ ] Confirmed in-conversation (not from memory) that the module paths listed in “Context that must not be re-derived” are still the current structure. If anything has moved, update this plan before Code starts. |
0 commit comments