Skip to content

Commit dc73824

Browse files
committed
cross plaform investigation plan
1 parent 7a0749f commit dc73824

1 file changed

Lines changed: 247 additions & 0 deletions

File tree

crossinvestigation-plan.md

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
# GNAT — Cross-Tool Investigation Context Plan
2+
3+
**Scope:** this is GNAT’s side of the GNAT-o-sphere investigation-context work. It assumes SandGNAT, SenseGNAT, and RedGNAT will each ship a matching plan of their own. This document is the source of truth for the shared contract; the addon plans reference it.
4+
5+
**Intended audience:** Claude Code working in the `wrhalpin/GNAT` repo.
6+
7+
-----
8+
9+
## Context that must not be re-derived
10+
11+
GNAT already has:
12+
13+
- `gnat/analysis/investigations/``Investigation`, `Hypothesis`, `AnalystNote`, `InvestigationTask`, state machine (`OPEN → IN_PROGRESS → REVIEW → CLOSED`), `InvestigationService`, `InvestigationStore` (SQLAlchemy).
14+
- `gnat/investigations/` — cross-platform evidence-graph builder. `EvidenceGraph`, `EvidenceNode`, `EvidenceEdge`, `Seed`, `SeedType`, five-step pipeline (`seed → incident expansion → normalise → correlate → materialise`).
15+
- `gnat/analysis/correlation/``EntityResolver`, `RelationshipScorer`, `ClusterDetector`, `EnrichmentDispatcher`.
16+
- `gnat/analysis/timeline.py`, `gnat/analysis/graph.py`, `gnat/analysis/copilot/gap_detector.py`, `gnat/analysis/copilot/drafting.py`.
17+
- `gnat/reporting/``Report`, `Finding`, `EvidenceLink`, `Attribution`, five-state lifecycle, STIX 2.1 report SDO export.
18+
- Admiralty Scale confidence scoring, TLP 2.0, AI confidence ceiling of 60.
19+
- `TenantRegistry` and `WorkspaceManager` for multi-tenant isolation.
20+
21+
**Do not build a second investigation model.** The work in this plan extends the existing `gnat.analysis.investigations.Investigation` — it does not replace, shadow, or parallel it.
22+
23+
If any of the above has changed since this plan was written, confirm the current state in-conversation before proceeding. Do not assume this plan is current.
24+
25+
-----
26+
27+
## Goal
28+
29+
Let SandGNAT, SenseGNAT, and RedGNAT attach their outputs to GNAT investigations without coupling them to GNAT’s internals.
30+
31+
-----
32+
33+
## The shared contract (source of truth)
34+
35+
Three custom STIX properties on any object an addon emits:
36+
37+
|Property |Required |Purpose |
38+
|--------------------------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
39+
|`x_gnat_investigation_id` |yes, when an investigation is known|Primary key of the `Investigation` row in GNAT. String. Must match an existing investigation ID; addons never mint new IDs. |
40+
|`x_gnat_investigation_origin` |yes |One of `"sandgnat"`, `"sensegnat"`, `"redgnat"`, `"gnat"`, `"external"`. Tells the receiver which addon produced the object so the evidence graph can label the node. |
41+
|`x_gnat_investigation_link_type`|no, defaults to `"inferred"` |One of `"confirmed"` (addon is certain this belongs to the investigation — e.g. RedGNAT emulated a specific hypothesis), `"inferred"` (correlation logic linked it), `"suggested"` (proposed, pending analyst acceptance).|
42+
43+
Addons should also wrap their per-run output in a STIX `Grouping` with the same three properties set on the Grouping itself. The Grouping’s `object_refs` lists the objects emitted in that run. Consumers can then either consume the Grouping as a single evidence bundle or iterate the individual objects.
44+
45+
Confidence scoring rules (from GNAT policy, not changed by this plan):
46+
47+
- Any object with `x_source_type = "ai_extracted"` is capped at confidence 60. Addons must respect this.
48+
- Correlation-inferred links carry a separate `x_gnat_correlation_confidence` (0–100) independent of the object’s own confidence; GNAT assigns this at receive time.
49+
50+
**Investigation IDs are tenant-scoped.** The receiver validates that every stamped `x_gnat_investigation_id` belongs to an investigation in the tenant the incoming request is authenticated for. Cross-tenant references are rejected.
51+
52+
A formal JSON schema for these three properties and the Grouping envelope lives at `docs/reference/investigation-context-schema.md` (new — see phase 0).
53+
54+
-----
55+
56+
## Phase 0 — Docs and contract
57+
58+
Three documents, no code yet. These are the artifacts the three addon plans reference; lock them down before anyone starts coding.
59+
60+
### 0.1 ADR
61+
62+
Path: `docs/architecture/adrs/ADR-00XX-gnat-investigation-context.md` (pick the next available number).
63+
64+
Decisions to capture:
65+
66+
- Adopt `x_gnat_investigation_id`, `x_gnat_investigation_origin`, `x_gnat_investigation_link_type` as the shared cross-tool correlation contract.
67+
- These properties are custom STIX properties stamped on individual objects **and** on a wrapping `Grouping` per addon run.
68+
- Investigation identity is owned by GNAT’s existing `gnat.analysis.investigations.Investigation`. Addons never create investigations.
69+
- Addons are never required to stamp the properties — objects without them are ingested normally. The properties are additive metadata, not a hard dependency.
70+
- The receive path accepts externally-stamped STIX via existing TAXII 2.1 ingest; no new protocol is introduced.
71+
- Cross-tenant investigation references are rejected at ingest.
72+
73+
### 0.2 Reference schema
74+
75+
Path: `docs/reference/investigation-context-schema.md`.
76+
77+
Contents: exact JSON schema for the three properties, constraints, examples of a single stamped `Indicator`, examples of a `Grouping` wrapping a SandGNAT detonation bundle, error cases (bad tenant, unknown investigation, malformed origin value).
78+
79+
### 0.3 Explanation doc
80+
81+
Path: `docs/explanation/cross-tool-investigation-model.md`.
82+
83+
Contents: why the model exists, how each addon participates, the relationship between this model and the existing `Investigation`, `EvidenceGraph`, and `Report` primitives. One end-to-end diagram showing a seed IOC → SandGNAT detonation → SenseGNAT correlation → RedGNAT validation → GNAT report, with the investigation_id threaded through every step.
84+
85+
-----
86+
87+
## Phase 1 — Investigation API surface (thin additions)
88+
89+
The existing `InvestigationService` has CRUD. Addons need a small, purposeful surface on top of it. Nothing here adds new models.
90+
91+
### 1.1 Addon-facing REST endpoints
92+
93+
Mount under the existing gateway router (`gnat/dissemination/api/`). All endpoints require the same `X-Api-Key` and tenant header the gateway already uses.
94+
95+
|Method|Path |Purpose |
96+
|------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
97+
|`GET` |`/api/investigations` |List investigations visible to the authenticated tenant. Supports `status`, `created_since`, `tag`, pagination. |
98+
|`GET` |`/api/investigations/{id}` |Fetch a single investigation with its hypotheses and linked object counts. |
99+
|`GET` |`/api/investigations/{id}/hypotheses`|List hypotheses for the investigation. RedGNAT uses this to pick a hypothesis to emulate. |
100+
|`POST`|`/api/investigations/{id}/evidence` |Accept a STIX bundle or Grouping stamped with this investigation’s ID. Validates tenant, validates investigation exists and is not `CLOSED`, validates all contained objects carry a matching `x_gnat_investigation_id`, then routes into existing ingest.|
101+
102+
The `POST .../evidence` endpoint is the one new ingest path. Implementation should delegate to the existing ingest pipeline rather than reimplementing STIX validation.
103+
104+
### 1.2 InvestigationService method additions
105+
106+
In `gnat/analysis/investigations/service.py`:
107+
108+
- `attach_evidence_bundle(investigation_id, bundle, origin, tenant_id) -> AttachResult` — validates, routes to ingest, and returns a structured result (`accepted_count`, `rejected_count`, `rejection_reasons`).
109+
- `find_by_subject(subject_ref, tenant_id) -> list[Investigation]` — returns investigations whose EvidenceGraph already contains `subject_ref`. SenseGNAT calls this at detector-emission time to auto-tag findings.
110+
111+
Both methods are thin — they orchestrate existing components.
112+
113+
### 1.3 Closed-investigation policy
114+
115+
When `POST /api/investigations/{id}/evidence` targets a `CLOSED` investigation:
116+
117+
- Reject with `409 Conflict` by default.
118+
- Include an `X-Reopen-Investigation` header that an authorised caller can set to auto-reopen (move state back to `IN_PROGRESS`, log an `AnalystNote` recording why).
119+
120+
This keeps closed investigations stable while giving an explicit path to add late-arriving evidence.
121+
122+
-----
123+
124+
## Phase 2 — Evidence graph integration
125+
126+
Addon outputs must show up as nodes in the existing `EvidenceGraph`, correctly labeled by origin.
127+
128+
### 2.1 Normalizer pass-through
129+
130+
In `gnat/investigations/normalizer.py`:
131+
132+
- When a raw platform record carries `x_gnat_investigation_id`, `x_gnat_investigation_origin`, or `x_gnat_investigation_link_type`, those three fields must be preserved on the resulting `EvidenceNode` as node metadata.
133+
- Add a new `origin` field on `EvidenceNode` (default `"gnat"`). It’s the source-of-truth label for graph views and report grouping.
134+
135+
### 2.2 Correlator behaviour
136+
137+
In `gnat/investigations/correlator.py`:
138+
139+
- Addon-sourced nodes must participate in correlation the same way as connector-sourced nodes.
140+
- New edges that connect an addon-sourced node to another node get `link_type="inferred"` by default, unless the addon explicitly marked a link as `"confirmed"` (RedGNAT emulation against a specific hypothesis is the canonical case).
141+
142+
### 2.3 Graph query surface
143+
144+
In `gnat/analysis/graph.py`:
145+
146+
- Add `filter_by_origin(origin_list)` to `GraphQuery`. Analysts filtering a view to “show me only SenseGNAT-sourced nodes in this investigation” must work.
147+
148+
-----
149+
150+
## Phase 3 — Cross-tool report template
151+
152+
One new report template, nothing more. `gnat.reporting` already does the heavy lifting.
153+
154+
Path: `gnat/reports/templates/cross_tool_investigation.py`.
155+
156+
The template pulls, for a given `investigation_id`:
157+
158+
- Investigation header (title, status, hypotheses, analyst notes).
159+
- Timeline from `TimelineBuilder` filtered by investigation_id.
160+
- Sections grouped by `origin`:
161+
- **SandGNAT findings** — malware analyses, artifacts, extracted indicators, similarity neighbours.
162+
- **SenseGNAT findings** — behavioural detections with narrative strings intact.
163+
- **RedGNAT findings** — emulation runs, techniques exercised, detection gaps.
164+
- **GNAT / external** — everything else.
165+
- Confidence and attribution summary from existing `Attribution` and `ConfidenceScore`.
166+
- Recommendations section drafted by `ReportDraftingAssistant` (already exists; confidence-ceiling rules already apply).
167+
- Appendix: raw STIX references.
168+
169+
Expose as `gnat report run --template cross_tool_investigation --investigation IC-2026-0001 --formats pdf html`.
170+
171+
-----
172+
173+
## Phase 4 — CLI additions
174+
175+
In `gnat/investigations/cli.py` (this CLI is light; don’t build a parallel CLI for `gnat.analysis.investigations`):
176+
177+
```
178+
gnat investigation list [--tenant X] [--status open]
179+
gnat investigation show <id>
180+
gnat investigation evidence <id> # list linked objects grouped by origin
181+
gnat investigation graph <id> [--origin sandgnat,sensegnat]
182+
gnat investigation export <id> # STIX bundle, preserves custom properties
183+
gnat investigation report <id> --format pdf
184+
```
185+
186+
No `create` / `link` commands in this plan — creation and link management already have a surface in `InvestigationService` and the existing analyst UI. Keep this CLI focused on the cross-tool read path.
187+
188+
-----
189+
190+
## Phase 5 — Tests
191+
192+
### Unit
193+
194+
- `tests/unit/investigations/test_evidence_api.py` — the POST endpoint: accepts a valid stamped bundle, rejects mismatched investigation_id, rejects cross-tenant, rejects closed investigation without reopen header, accepts with reopen header.
195+
- `tests/unit/investigations/test_normalizer_passthrough.py` — all three custom properties survive the normalize step and land on `EvidenceNode`.
196+
- `tests/unit/investigations/test_service_additions.py``attach_evidence_bundle` and `find_by_subject`.
197+
- `tests/unit/reports/test_cross_tool_template.py` — template renders with fixture data from all four origins.
198+
199+
### Integration
200+
201+
- `tests/integration/test_cross_tool_ingest.py` — spin up a tenant, create an investigation, POST a fixture bundle labelled each origin, verify evidence graph contains nodes labelled correctly.
202+
203+
Hit 70% coverage minimum (existing gate). Don’t lower it.
204+
205+
-----
206+
207+
## Out of scope
208+
209+
- A new investigation model. The existing one stays.
210+
- A new evidence graph. Same.
211+
- A new correlation engine. Same.
212+
- “Investigation graph view UI” as a new feature — the existing TUI and web dashboard render `EvidenceGraph` today. The only UI work in this plan is the origin filter (Phase 2.3).
213+
- STIX `Incident` object. Out of scope for this pass; `Grouping` covers the immediate need.
214+
215+
-----
216+
217+
## Acceptance criteria
218+
219+
1. An analyst creates an investigation in GNAT (existing flow, unchanged).
220+
1. SandGNAT, SenseGNAT, and RedGNAT can each POST stamped STIX bundles to `/api/investigations/{id}/evidence` and the objects land in the investigation’s `EvidenceGraph` with correct `origin` labels.
221+
1. `gnat investigation graph <id> --origin sensegnat` returns only SenseGNAT-sourced nodes.
222+
1. `gnat investigation report <id> --format pdf` produces a report with sections grouped by origin.
223+
1. STIX export of the investigation preserves all three custom properties on every object.
224+
1. Existing standalone ingest paths (TAXII, ingest pipelines) still work without any `x_gnat_investigation_id` present.
225+
1. Cross-tenant investigation references are rejected with a clear error.
226+
1. Closed-investigation evidence POSTs are rejected unless `X-Reopen-Investigation` is set.
227+
228+
-----
229+
230+
## Risks
231+
232+
|Risk |Mitigation |
233+
|-------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
234+
|Parallel investigation models appear because a future planner doesn’t see the existing one.|Keep this plan and its ADR in `docs/architecture/`. Any future plan that proposes a new investigation model must reference this ADR and justify deviation.|
235+
|Cross-tenant ID leakage. |Tenant validation on every endpoint touching investigation_id. Integration test covers it. |
236+
|Addons silently drop the custom properties through a STIX round-trip. |Normalizer pass-through test (Phase 5). Run it in addon CI too. |
237+
|Closed-investigation policy causes data loss. |The `X-Reopen-Investigation` escape hatch with `AnalystNote` audit trail. |
238+
|AI-generated “suggested” links pollute the graph. |Existing `confidence_ceiling = 60` and the `"suggested"` link_type keep them filterable. Default view hides `"suggested"` unless opted in. |
239+
240+
-----
241+
242+
## Handoff checklist before starting code
243+
244+
- [ ] Phase 0 docs written and reviewed.
245+
- [ ] The three custom property names match exactly what the addon plans reference.
246+
- [ ] The REST endpoints in Phase 1.1 are reviewed against the existing gateway router so we don’t create a parallel auth surface.
247+
- [ ] Confirmed in-conversation (not from memory) that the module paths listed in “Context that must not be re-derived” are still the current structure. If anything has moved, update this plan before Code starts.

0 commit comments

Comments
 (0)