-
Notifications
You must be signed in to change notification settings - Fork 0
Implement analysis layer: Phase 0-2 (foundation, investigations, repo… #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # ADR-0031: Analysis Layer Architecture | ||
|
|
||
| **Decision:** Implement three distinct analyst-facing modules — | ||
| `gnat.analysis`, `gnat.reporting`, and `gnat.dissemination` — as | ||
| consumers of the existing storage layer. No new storage backend is | ||
| introduced at this stage. | ||
|
|
||
| **Problem statement:** | ||
| GNAT fully covers the bottom half of the CTI lifecycle (Collection → | ||
| Processing → Storage) but has no analyst-facing layer. Intelligence | ||
| products (investigations, reports) live entirely outside the platform. | ||
| This forces analysts to maintain parallel systems and breaks provenance | ||
| from raw indicator to finished intelligence. | ||
|
|
||
| **Layered consumer model:** | ||
| The three new modules sit above the existing storage layer and do not | ||
| replace or bypass the ingestion pipeline: | ||
|
|
||
| ``` | ||
| [Connectors] → [Ingestion] → [Storage: Postgres + Solr] | ||
| │ | ||
| ┌───────────────┼───────────────┐ | ||
| │ │ │ | ||
| [Analysis] [Reporting] [Dissemination] | ||
| ``` | ||
|
|
||
| Each layer reads from storage; only `gnat.analysis` and `gnat.reporting` | ||
| write new objects (Investigation, Report) back to Postgres. | ||
|
|
||
| **Why not a separate analysis database:** | ||
| A separate graph or document database would introduce operational | ||
| overhead (new service, backup strategy, replication) for data that is | ||
| structurally similar to the STIX property-bag objects already in Postgres. | ||
| The `WorkspaceStore` SQLAlchemy pattern (serialize-to-JSON + indexed | ||
| metadata columns) is sufficient for Investigation and Report objects. | ||
| Revisit if graph traversal depth or full-text search requirements | ||
| exceed Postgres + Solr capabilities. | ||
|
|
||
| **Module boundaries:** | ||
|
|
||
| | Module | Responsibility | Writes to | | ||
| |--------|---------------|-----------| | ||
| | `gnat.analysis` | Investigation objects, correlation, confidence scoring, timeline | `analysis_*` tables | | ||
| | `gnat.reporting` | Report lifecycle, evidence binding, STIX serialization | `report_*` tables | | ||
| | `gnat.dissemination` | STIX bundle export, TAXII server, webhooks | Read-only (exports) | | ||
|
|
||
| **Persistence strategy:** | ||
| Follows the established `WorkspaceStore` pattern: | ||
| - SQLAlchemy declarative models with `create_all()` (no Alembic) | ||
| - Core dataclasses are pure Python — zero SQLAlchemy dependency in models | ||
| - Repository classes handle SQLAlchemy session lifecycle | ||
| - Objects serialized as JSON in `_json` text column + indexed metadata | ||
| columns for efficient lookup | ||
|
|
||
| **Dependencies:** | ||
| - Core models: zero new dependencies | ||
| - Storage: `sqlalchemy>=2.0` (already in `[persist]` extra) | ||
| - STIX export: zero (uses existing ORM) | ||
| - TAXII server: `taxii2-server` (Phase 4, new `[taxii-server]` extra) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # ADR-0032: STIX Custom Objects for Analysis Layer | ||
|
|
||
| **Decision:** Use `x-gnat-investigation` as a STIX 2.1 custom SDO for | ||
| Investigation export. Use standard STIX `report` SDO for Report export. | ||
| Introduce `investigates` as a custom STIX relationship verb. | ||
|
|
||
| **STIX 2.1 has no Investigation SDO:** | ||
| The STIX 2.1 specification defines `report` (finished intelligence) but | ||
| has no equivalent for the *in-progress* analyst workspace. Custom objects | ||
| (`x-` prefix) are the correct mechanism per §10.9 of the specification. | ||
|
|
||
| **`x-gnat-investigation` schema:** | ||
| ```json | ||
| { | ||
| "type": "x-gnat-investigation", | ||
| "spec_version": "2.1", | ||
| "id": "x-gnat-investigation--<uuid>", | ||
| "created": "<timestamp>", | ||
| "modified": "<timestamp>", | ||
| "name": "<title>", | ||
| "description": "<description>", | ||
| "status": "open|in_progress|review|closed", | ||
| "x_tlp": "white|green|amber|amber+strict|red", | ||
| "x_created_by": "<analyst id>", | ||
| "x_assigned_to": ["<analyst id>"], | ||
| "x_scope": { ... }, | ||
| "x_hypothesis_count": 0, | ||
| "x_linked_indicators": ["indicator--<uuid>", ...], | ||
| "x_linked_threat_actors": ["threat-actor--<uuid>", ...], | ||
| "x_linked_campaigns": ["campaign--<uuid>", ...] | ||
| } | ||
| ``` | ||
|
|
||
| **Standard STIX `report` SDO for finished intelligence:** | ||
| When a GNAT Report reaches `PUBLISHED` status it serializes as a STIX | ||
| `report` SDO. `object_refs` is populated with all linked indicators, | ||
| observables, threat actors, campaigns, and the parent | ||
| `x-gnat-investigation` (if any). `published` maps to `published_at`. | ||
|
|
||
| **Custom relationship verb `investigates`:** | ||
| The standard STIX verbs do not capture the analyst action of | ||
| investigating an artifact. Add `investigates` as a custom relationship | ||
| type linking `x-gnat-investigation` → linked artifacts. The | ||
| `relationship_type` field accepts free-form strings per STIX 2.1 §7.4. | ||
|
|
||
| **Why not reuse `report` for Investigation:** | ||
| A STIX `report` is a *finished intelligence product* with a `published` | ||
| timestamp. An in-progress Investigation has lifecycle states (OPEN, | ||
| IN_PROGRESS, REVIEW) that have no mapping to the report SDO. Forcing a | ||
| mapping would either lose state information or require awkward label | ||
| encoding. The custom SDO is semantically cleaner and unambiguous. | ||
|
|
||
| **Interoperability note:** | ||
| STIX consumers that do not recognise `x-gnat-investigation` will ignore | ||
| the custom objects (per STIX 2.1 §3.2 ignore-unknown-properties guidance) | ||
| but still process all standard SDOs and SROs in the same bundle. Export | ||
| bundles always include both the custom investigation object and all | ||
| standard STIX objects it references, so partial consumers still receive | ||
| full indicator/threat-actor data. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # ADR-0033: Confidence Scoring Model | ||
|
|
||
| **Decision:** Adopt the NATO Admiralty Scale (source reliability A–F, | ||
| information credibility 1–6) combined with the STIX 2.1 numeric | ||
| confidence field (0–100) as the unified confidence model. | ||
|
|
||
| **Why Admiralty Scale:** | ||
| The Admiralty Scale is the dominant confidence framework in professional | ||
| and government CTI. It explicitly separates source reliability from | ||
| information credibility — a distinction that is frequently collapsed in | ||
| ad-hoc approaches and is a common source of analytical error. It is | ||
| taught in analytic tradecraft training (e.g., UK CPNI, US IC standards) | ||
| and is immediately familiar to professional analysts. | ||
|
|
||
| **Why not structured analytic techniques (SATs) alone:** | ||
| SATs (ACH, red teaming, etc.) are processes, not data model fields. | ||
| They are analyst workflows, not attributes of an intelligence object. | ||
| A confidence *field* on a Finding or Hypothesis needs a fixed schema that | ||
| can be stored, queried, and compared. The Admiralty Scale provides this. | ||
|
|
||
| **STIX 2.1 numeric confidence — required for interoperability:** | ||
| The STIX 2.1 `confidence` property is a mandatory integer 0–100. | ||
| Admiralty codes (e.g., "B2") have no direct STIX mapping. We store both: | ||
| the Admiralty pair for analytic rigour, the numeric value for STIX | ||
| compliance and programmatic filtering. The numeric value is set explicitly | ||
| by the analyst (not auto-derived from Admiralty codes) because the mapping | ||
| from Admiralty pair to numeric is not standardised and varies by | ||
| organisation. | ||
|
|
||
| **Convenience bands (HIGH/MEDIUM/LOW):** | ||
| UI display and filtering benefit from three-level bands. Bands map to | ||
| STIX numeric ranges: HIGH ≥ 70, MEDIUM 40–69, LOW < 40. These align with | ||
| the MITRE ATT&CK confidence convention. | ||
|
|
||
| **Propagation rule:** | ||
| When the CorrelationEngine (Phase 3) assembles a Finding from multiple | ||
| EvidenceLinks, the composite confidence should not exceed the minimum | ||
| credibility of any contributing source. Implementation: take the | ||
| minimum `stix_confidence` across all supporting EvidenceLinks and apply | ||
| a small uplift for corroboration (+5 per additional independent source, | ||
| capped at the minimum source's maximum band ceiling). | ||
|
|
||
| **`ConfidenceScore` model location:** | ||
| `gnat.analysis.confidence` — shared dependency imported by | ||
| `gnat.analysis.investigations`, `gnat.reporting`, and | ||
| `gnat.investigations` (the existing EvidenceGraph module). |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,64 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ADR-0034: Report Lifecycle State Machine | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Decision:** Five-state lifecycle: DRAFT → REVIEW → APPROVED → | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PUBLISHED → ARCHIVED. Transitions are enforced by `ReportService`. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Direct jumps are not permitted except for explicit administrative | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| archive. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **State definitions:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | State | Meaning | Who can set | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |-------|---------|-------------| | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | DRAFT | Work in progress; content may be incomplete | Author | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | REVIEW | Submitted for peer or management review | Author | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | APPROVED | Review complete; approved for dissemination | Reviewer | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | PUBLISHED | Disseminated; STIX bundle generated; immutable content | Approver | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | ARCHIVED | Superseded or withdrawn; not for distribution | Any | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Valid transitions:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DRAFT ──► REVIEW ──► APPROVED ──► PUBLISHED | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ │ │ │ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| └───────────┘ │ │ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| (reject back │ │ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| to DRAFT) ▼ ▼ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ARCHIVED ARCHIVED | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DRAFT ↔ REVIEW is the only bidirectional transition (review rejection | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sends the report back to DRAFT for revision). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Why APPROVED is separate from PUBLISHED:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| In most CTI teams, the analyst who writes the report is not the same | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| person who approves it for external distribution. Requiring explicit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| approval before publish enforces a review gate. Teams without a formal | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| review process can configure `auto_approve = true` in the report template, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| which collapses REVIEW → APPROVED → PUBLISHED into a single step. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Why no CANCELLED state:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Cancelled reports should be ARCHIVED, not deleted. Maintaining the full | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| history (including withdrawn intelligence) is a compliance and audit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| requirement in most organisations. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Immutability on PUBLISHED:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Once a report reaches PUBLISHED, its content fields (body_sections, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| key_findings, evidence_links) become read-only. Updates produce a new | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Report version with `parent_report_id` pointing to the previous | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| published version and `version` incremented. This mirrors the STIX 2.1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| versioning model where `modified` creates a logical new version rather | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| than mutating the original. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Versioning implementation:** | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| `ReportService.publish(report_id)` increments `version`, sets | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| `published_at`, generates the STIX bundle, and marks content as | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| immutable via a `is_published` flag in storage. A new draft is created | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| with `parent_report_id` set when an analyst wants to revise a published | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| report. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+46
to
+57
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| key_findings, evidence_links) become read-only. Updates produce a new | |
| Report version with `parent_report_id` pointing to the previous | |
| published version and `version` incremented. This mirrors the STIX 2.1 | |
| versioning model where `modified` creates a logical new version rather | |
| than mutating the original. | |
| **Versioning implementation:** | |
| `ReportService.publish(report_id)` increments `version`, sets | |
| `published_at`, generates the STIX bundle, and marks content as | |
| immutable via a `is_published` flag in storage. A new draft is created | |
| with `parent_report_id` set when an analyst wants to revise a published | |
| report. | |
| key_findings, evidence_links) become read-only. Immutability is | |
| enforced by `Report.status = PUBLISHED`, rather than by a separate | |
| storage flag. Updates to published content produce a new draft version | |
| with `parent_report_id` pointing to the previous published report and | |
| `version` incremented for that new draft. This mirrors the STIX 2.1 | |
| versioning model where `modified` creates a logical new version rather | |
| than mutating the original. | |
| **Versioning implementation:** | |
| `ReportService.publish(report_id)` transitions the report to | |
| PUBLISHED, sets `published_at`, and generates the STIX bundle. | |
| Content immutability is enforced by the PUBLISHED status. When an | |
| analyst wants to revise a published report, the system creates a new | |
| draft with `parent_report_id` set to the prior published report and | |
| an incremented `version`. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| """ | ||
| gnat.analysis | ||
| ============= | ||
|
|
||
| Analyst-facing layer transforming ingested CTI data into intelligence products. | ||
|
|
||
| Modules | ||
| ------- | ||
| confidence | ||
| :class:`~.confidence.ConfidenceScore` combining the NATO Admiralty Scale | ||
| (source reliability A–F, information credibility 1–6) with a STIX 2.1 | ||
| numeric confidence value (0–100). | ||
| tlp | ||
| :class:`~.tlp.TLPLevel` — TLP 2.0 classification levels shared across the | ||
| analysis, reporting, and dissemination layers. | ||
| investigations | ||
| First-class :class:`~.investigations.Investigation` objects with lifecycle | ||
| management, hypothesis tracking, analyst notes, task management, and | ||
| artifact linking. | ||
|
|
||
| Architecture | ||
| ------------ | ||
| The analysis layer sits above the existing storage layer (Postgres + Solr) and | ||
| does not replace or bypass the ingestion pipeline. See ADR-0031 for the full | ||
| rationale. | ||
|
|
||
| Quick start:: | ||
|
|
||
| from gnat.analysis.confidence import ConfidenceScore, SourceReliability, InformationCredibility | ||
| from gnat.analysis.tlp import TLPLevel | ||
| from gnat.analysis.investigations import Investigation, InvestigationService, InvestigationStore | ||
|
|
||
| score = ConfidenceScore.high(rationale="Cross-corroborated by two independent sources.") | ||
| print(score.label) # "B2 (HIGH)" | ||
|
|
||
| store = InvestigationStore("sqlite:///~/.gnat/gnat.db") | ||
| store.create_all() | ||
| service = InvestigationService(store) | ||
|
|
||
| inv = service.create(title="APT28 Campaign Apr 2026", created_by="analyst@example.com") | ||
| """ | ||
|
|
||
| from gnat.analysis.confidence import ( | ||
| ConfidenceLevel, | ||
| ConfidenceScore, | ||
| InformationCredibility, | ||
| SourceReliability, | ||
| ) | ||
| from gnat.analysis.tlp import TLPLevel | ||
|
|
||
| __all__ = [ | ||
| # Confidence | ||
| "ConfidenceScore", | ||
| "ConfidenceLevel", | ||
| "SourceReliability", | ||
| "InformationCredibility", | ||
| # TLP | ||
| "TLPLevel", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test counts in the release notes don’t match the actual new test files in this PR (e.g., the PR description mentions 19/24/38, but this section lists 16/24/30). Please update these numbers to reflect the current tests so the changelog remains accurate.