Skip to content

Implement analysis layer: Phase 0-2 (foundation, investigations, repo…#73

Merged
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ
Apr 7, 2026
Merged

Implement analysis layer: Phase 0-2 (foundation, investigations, repo…#73
wrhalpin merged 1 commit intomainfrom
claude/add-claude-documentation-k8vvJ

Conversation

@wrhalpin
Copy link
Copy Markdown
Owner

@wrhalpin wrhalpin commented Apr 7, 2026

…rting)

Phase 0 — Foundation

  • gnat/analysis/tlp.py: TLPLevel enum (TLP 2.0 WHITE/CLEAR/GREEN/AMBER/ AMBER+STRICT/RED) with STIX marking IDs, hex colours, rank ordering
  • gnat/analysis/confidence.py: ConfidenceScore combining NATO Admiralty Scale (source reliability A–F, information credibility 1–6) with STIX numeric confidence 0–100; ConfidenceLevel bands (HIGH/MEDIUM/LOW); convenience factories high/medium/low()
  • ADR-0031: Analysis layer architecture — layered consumer model, no new storage backend, WorkspaceStore persistence pattern
  • ADR-0032: STIX custom objects — x-gnat-investigation SDO, investigates relationship verb, standard report SDO for finished intelligence
  • ADR-0033: Confidence scoring — rationale for Admiralty Scale + STIX numeric confidence; HIGH/MEDIUM/LOW bands aligned with ATT&CK
  • ADR-0034: Report lifecycle — five-state machine, REVIEW→DRAFT reject path, immutability on PUBLISHED, STIX bundle triggered on publish

Phase 1 — gnat.analysis.investigations

  • Investigation dataclass: state machine OPEN→IN_PROGRESS→REVIEW→CLOSED, TLP classification, scope, hypotheses, analyst notes, tasks, artifact refs
  • Hypothesis, AnalystNote, InvestigationTask, InvestigationScope dataclasses
  • InvestigationStore: SQLAlchemy-backed (sqlite:///:memory: for tests), zero-migration create_all(), JSON-serialization + indexed metadata columns
  • InvestigationService: enforces transitions, note/task/hypothesis/artifact mutation, deduplicating tag/indicator linking, summary

Phase 2 — gnat.reporting

  • Report dataclass: DRAFT→REVIEW→APPROVED→PUBLISHED→ARCHIVED lifecycle, versioning with parent_report_id, TLP, findings, evidence binding, attribution, STIX export
  • Finding, EvidenceLink, Attribution, ReportSection, ChangelogEntry
  • ReportStore: same SQLAlchemy pattern as InvestigationStore
  • ReportService: lifecycle transitions, immutability enforcement on PUBLISHED, create_revision() for updates to published reports
  • report_to_stix_bundle(): STIX 2.1 bundle (report SDO + identity + threat- actor + attributed-to rel if attribution set); TLP marking refs; x_gnat_* extension fields; stix_report_ref set on publish
  • Three YAML report templates: incident_report, threat_actor_profile, campaign_analysis (with section structure and analyst guidance)
  • [analysis] and [reporting] optional dependency extras

Tests (81 tests, all passing)

  • tests/unit/analysis/test_confidence.py: 19 tests
  • tests/unit/analysis/test_investigations.py: 24 tests
  • tests/unit/reporting/test_reports.py: 38 tests

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq

…rting)

Phase 0 — Foundation
- gnat/analysis/tlp.py: TLPLevel enum (TLP 2.0 WHITE/CLEAR/GREEN/AMBER/
  AMBER+STRICT/RED) with STIX marking IDs, hex colours, rank ordering
- gnat/analysis/confidence.py: ConfidenceScore combining NATO Admiralty Scale
  (source reliability A–F, information credibility 1–6) with STIX numeric
  confidence 0–100; ConfidenceLevel bands (HIGH/MEDIUM/LOW); convenience
  factories high/medium/low()
- ADR-0031: Analysis layer architecture — layered consumer model, no new
  storage backend, WorkspaceStore persistence pattern
- ADR-0032: STIX custom objects — x-gnat-investigation SDO, investigates
  relationship verb, standard report SDO for finished intelligence
- ADR-0033: Confidence scoring — rationale for Admiralty Scale + STIX
  numeric confidence; HIGH/MEDIUM/LOW bands aligned with ATT&CK
- ADR-0034: Report lifecycle — five-state machine, REVIEW→DRAFT reject path,
  immutability on PUBLISHED, STIX bundle triggered on publish

Phase 1 — gnat.analysis.investigations
- Investigation dataclass: state machine OPEN→IN_PROGRESS→REVIEW→CLOSED,
  TLP classification, scope, hypotheses, analyst notes, tasks, artifact refs
- Hypothesis, AnalystNote, InvestigationTask, InvestigationScope dataclasses
- InvestigationStore: SQLAlchemy-backed (sqlite:///:memory: for tests),
  zero-migration create_all(), JSON-serialization + indexed metadata columns
- InvestigationService: enforces transitions, note/task/hypothesis/artifact
  mutation, deduplicating tag/indicator linking, summary

Phase 2 — gnat.reporting
- Report dataclass: DRAFT→REVIEW→APPROVED→PUBLISHED→ARCHIVED lifecycle,
  versioning with parent_report_id, TLP, findings, evidence binding,
  attribution, STIX export
- Finding, EvidenceLink, Attribution, ReportSection, ChangelogEntry
- ReportStore: same SQLAlchemy pattern as InvestigationStore
- ReportService: lifecycle transitions, immutability enforcement on PUBLISHED,
  create_revision() for updates to published reports
- report_to_stix_bundle(): STIX 2.1 bundle (report SDO + identity + threat-
  actor + attributed-to rel if attribution set); TLP marking refs; x_gnat_*
  extension fields; stix_report_ref set on publish
- Three YAML report templates: incident_report, threat_actor_profile,
  campaign_analysis (with section structure and analyst guidance)
- [analysis] and [reporting] optional dependency extras

Tests (81 tests, all passing)
- tests/unit/analysis/test_confidence.py: 19 tests
- tests/unit/analysis/test_investigations.py: 24 tests
- tests/unit/reporting/test_reports.py: 38 tests

https://claude.ai/code/session_01BDoue9HxB83ijLzFARAugq
Copilot AI review requested due to automatic review settings April 7, 2026 13:27
@wrhalpin wrhalpin merged commit 91f87cb into main Apr 7, 2026
10 of 22 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an initial “analysis layer” and “reporting layer” to GNAT, adding analyst-facing Investigation/Report domain models, lifecycle services, SQLAlchemy-backed persistence, and STIX 2.1 export, along with ADRs and unit tests.

Changes:

  • Add gnat.analysis foundation types (TLPLevel, ConfidenceScore) plus gnat.analysis.investigations (models/service/store).
  • Add gnat.reporting (models/service/store), YAML report templates, and STIX bundle export.
  • Add ADRs (0031–0034), update packaging (pyproject.toml), and add unit tests + changelog entry.

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
pyproject.toml Adds [analysis] / [reporting] extras and includes reporting templates as package data.
gnat/analysis/__init__.py Exposes analysis-layer public API (confidence + TLP).
gnat/analysis/tlp.py Implements TLP 2.0 enum with labels/colours/ranking and STIX marking IDs.
gnat/analysis/confidence.py Implements Admiralty Scale + STIX numeric confidence composite model.
gnat/analysis/investigations/__init__.py Exposes investigations public API surface.
gnat/analysis/investigations/models.py Adds Investigation domain dataclasses + enums + serialization/state machine.
gnat/analysis/investigations/service.py Adds Investigation lifecycle/mutation service layer.
gnat/analysis/investigations/storage.py Adds SQLAlchemy persistence for investigations (JSON blob + indexed fields).
gnat/reporting/__init__.py Exposes reporting public API surface and STIX export entrypoint.
gnat/reporting/models.py Adds Report domain dataclasses + enums + serialization/state machine.
gnat/reporting/service.py Adds Report lifecycle/mutation service layer + publish/revision workflow.
gnat/reporting/storage.py Adds SQLAlchemy persistence for reports (JSON blob + indexed fields).
gnat/reporting/export/__init__.py Exposes STIX export helper.
gnat/reporting/export/stix.py Implements Report → STIX 2.1 bundle serialization.
gnat/reporting/templates/incident_report.yaml Adds incident report YAML template and guidance.
gnat/reporting/templates/threat_actor_profile.yaml Adds threat actor profile YAML template and guidance.
gnat/reporting/templates/campaign_analysis.yaml Adds campaign analysis YAML template and guidance.
tests/unit/analysis/__init__.py Test package marker.
tests/unit/reporting/__init__.py Test package marker.
tests/unit/analysis/test_confidence.py Unit coverage for TLP and confidence scoring.
tests/unit/analysis/test_investigations.py Unit coverage for investigations model/store/service.
tests/unit/reporting/test_reports.py Unit coverage for reporting model/store/service and STIX export.
docs/explanation/architecture/adrs/0031-analysis-layer-architecture.md Documents analysis-layer architecture decisions.
docs/explanation/architecture/adrs/0032-stix-custom-objects.md Documents custom STIX object/relationship decisions.
docs/explanation/architecture/adrs/0033-confidence-scoring.md Documents confidence scoring rationale and conventions.
docs/explanation/architecture/adrs/0034-report-lifecycle.md Documents report lifecycle state machine + publish semantics.
CHANGELOG.md Adds unreleased entry describing analysis/reporting features + tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gnat/reporting/service.py

report.status = ReportStatus.PUBLISHED
report.published_at = datetime.now(tz=timezone.utc)
report.updated_at = report.published_at
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publish() sets report.updated_at = report.published_at, but ReportStore.save() unconditionally overwrites report.updated_at with datetime.now(...), so the persisted updated_at will not match published_at as intended. Consider letting the service control updated_at for publish (or have the store only set updated_at when not already set / always rely on DB onupdate).

Suggested change
report.updated_at = report.published_at

Copilot uses AI. Check for mistakes.
Comment thread gnat/reporting/storage.py
Comment on lines +252 to +256
if linked_investigation is not None:
q = q.filter(ReportModel.linked_investigation == linked_investigation)
if tag is not None:
q = q.filter(ReportModel.tags_csv.contains(tag))
rows = (
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(tag=...) uses tags_csv.contains(tag), which will return false positives for substring matches (e.g., tag "ware" matches "ransomware", tag "a" matches nearly everything). Consider storing tags with a delimiter strategy that supports exact matches (e.g., wrapping with commas and searching for ,tag,) or normalizing tags into a separate table/JSON array and using an exact match query.

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md
Comment on lines +43 to +45
- `tests/unit/analysis/test_confidence.py`: 16 tests covering TLP ordering, STIX marking IDs, confidence bands, Admiralty Scale, serialization roundtrips, bounds validation
- `tests/unit/analysis/test_investigations.py`: 24 tests covering model roundtrips, state machine valid/invalid transitions, full service lifecycle (create/get/transition/note/task/hypothesis/link/delete/list/summary)
- `tests/unit/reporting/test_reports.py`: 30 tests covering report model, evidence links, attribution, full DRAFT→PUBLISHED lifecycle, immutability enforcement, STIX bundle structure and field correctness, revision creation
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test counts in the release notes don’t match the actual new test files in this PR (e.g., the PR description mentions 19/24/38, but this section lists 16/24/30). Please update these numbers to reflect the current tests so the changelog remains accurate.

Suggested change
- `tests/unit/analysis/test_confidence.py`: 16 tests covering TLP ordering, STIX marking IDs, confidence bands, Admiralty Scale, serialization roundtrips, bounds validation
- `tests/unit/analysis/test_investigations.py`: 24 tests covering model roundtrips, state machine valid/invalid transitions, full service lifecycle (create/get/transition/note/task/hypothesis/link/delete/list/summary)
- `tests/unit/reporting/test_reports.py`: 30 tests covering report model, evidence links, attribution, full DRAFT→PUBLISHED lifecycle, immutability enforcement, STIX bundle structure and field correctness, revision creation
- `tests/unit/analysis/test_confidence.py`: 19 tests covering TLP ordering, STIX marking IDs, confidence bands, Admiralty Scale, serialization roundtrips, bounds validation
- `tests/unit/analysis/test_investigations.py`: 24 tests covering model roundtrips, state machine valid/invalid transitions, full service lifecycle (create/get/transition/note/task/hypothesis/link/delete/list/summary)
- `tests/unit/reporting/test_reports.py`: 38 tests covering report model, evidence links, attribution, full DRAFT→PUBLISHED lifecycle, immutability enforcement, STIX bundle structure and field correctness, revision creation

Copilot uses AI. Check for mistakes.
Comment thread gnat/analysis/tlp.py
Comment on lines +70 to +74
These are the well-known IDs registered by FIRST at
``https://www.first.org/tlp/``.
"""
return _STIX_MARKING_IDS.get(self, f"marking-definition--{self.value}")

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLPLevel.stix_marking_id falls back to marking-definition--{self.value} for levels missing from _STIX_MARKING_IDS (notably AMBER_STRICT). That fallback is not a valid STIX marking-definition ID (must be marking-definition--<uuid>), so exporting objects classified as AMBER+STRICT will emit invalid object_marking_refs. Add the official marking-definition UUID for AMBER_STRICT (and any other missing levels) or raise a clear error rather than returning an invalid ID.

Copilot uses AI. Check for mistakes.
Comment thread gnat/reporting/models.py
Comment on lines +70 to +74
REPORT_TRANSITIONS: dict[ReportStatus, frozenset[ReportStatus]] = {
ReportStatus.DRAFT: frozenset({ReportStatus.REVIEW, ReportStatus.ARCHIVED}),
ReportStatus.REVIEW: frozenset({ReportStatus.DRAFT, ReportStatus.APPROVED, ReportStatus.ARCHIVED}),
ReportStatus.APPROVED: frozenset({ReportStatus.PUBLISHED, ReportStatus.DRAFT, ReportStatus.ARCHIVED}),
ReportStatus.PUBLISHED: frozenset({ReportStatus.ARCHIVED}),
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REPORT_TRANSITIONS allows APPROVED → DRAFT, but ADR-0034 describes DRAFT ↔ REVIEW as the only bidirectional transition (approval is meant to be a review gate). Either remove ReportStatus.DRAFT from the allowed transitions out of APPROVED, or update ADR-0034 to match the intended lifecycle.

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +223
if note and author:
inv.notes.append(AnalystNote(
content = f"**Status changed:** `{old_status.value}` → `{new_status.value}`\n\n{note}",
author = author,
))
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transition() says author is required if note is provided, but if note is set without author the implementation silently drops the note. Raise InvestigationError when note is provided without author (or update the docstring to reflect the actual behavior).

Copilot uses AI. Check for mistakes.
Comment on lines +244 to +247
q = q.filter(InvestigationModel.created_by == created_by)
if tag is not None:
q = q.filter(InvestigationModel.tags_csv.contains(tag))
rows = q.order_by(InvestigationModel.updated_at.desc()).offset(offset).limit(limit).all()
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(tag=...) uses tags_csv.contains(tag), which can produce substring false positives (e.g., "ware" matches "ransomware"). Consider an exact-match strategy (delimiter wrapping) or a normalized tag representation to avoid incorrect filtering results.

Copilot uses AI. Check for mistakes.
Comment thread gnat/reporting/storage.py
if not _SA_AVAILABLE:
raise ImportError(
"sqlalchemy is required for report persistence. "
"Install with: pip install 'gnat[persist]'"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_require_sqlalchemy() instructs users to install gnat[persist], but this PR also adds [reporting] extras that include SQLAlchemy. Consider updating the message to mention the relevant extras (e.g., gnat[reporting] / gnat[persist]) so installation guidance matches packaging options.

Suggested change
"Install with: pip install 'gnat[persist]'"
"Install with: pip install 'gnat[reporting]' or pip install 'gnat[persist]'"

Copilot uses AI. Check for mistakes.
if not _SA_AVAILABLE:
raise ImportError(
"sqlalchemy is required for investigation persistence. "
"Install with: pip install 'gnat[persist]'"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_require_sqlalchemy() instructs users to install gnat[persist], but this PR adds an [analysis] extra that also includes SQLAlchemy. Consider updating the guidance string to reference the relevant extras so installation instructions remain accurate.

Suggested change
"Install with: pip install 'gnat[persist]'"
"Install with: pip install 'gnat[persist]' or pip install 'gnat[analysis]'"

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +57
key_findings, evidence_links) become read-only. Updates produce a new
Report version with `parent_report_id` pointing to the previous
published version and `version` incremented. This mirrors the STIX 2.1
versioning model where `modified` creates a logical new version rather
than mutating the original.

**Versioning implementation:**
`ReportService.publish(report_id)` increments `version`, sets
`published_at`, generates the STIX bundle, and marks content as
immutable via a `is_published` flag in storage. A new draft is created
with `parent_report_id` set when an analyst wants to revise a published
report.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADR-0034’s “Versioning implementation” section says publish() increments version and that immutability is marked via an is_published flag in storage, but the current implementation neither increments Report.version on publish nor stores an is_published flag (immutability is enforced via Report.status). Update the ADR or the implementation so they match.

Suggested change
key_findings, evidence_links) become read-only. Updates produce a new
Report version with `parent_report_id` pointing to the previous
published version and `version` incremented. This mirrors the STIX 2.1
versioning model where `modified` creates a logical new version rather
than mutating the original.
**Versioning implementation:**
`ReportService.publish(report_id)` increments `version`, sets
`published_at`, generates the STIX bundle, and marks content as
immutable via a `is_published` flag in storage. A new draft is created
with `parent_report_id` set when an analyst wants to revise a published
report.
key_findings, evidence_links) become read-only. Immutability is
enforced by `Report.status = PUBLISHED`, rather than by a separate
storage flag. Updates to published content produce a new draft version
with `parent_report_id` pointing to the previous published report and
`version` incremented for that new draft. This mirrors the STIX 2.1
versioning model where `modified` creates a logical new version rather
than mutating the original.
**Versioning implementation:**
`ReportService.publish(report_id)` transitions the report to
PUBLISHED, sets `published_at`, and generates the STIX bundle.
Content immutability is enforced by the PUBLISHED status. When an
analyst wants to revise a published report, the system creates a new
draft with `parent_report_id` set to the prior published report and
an incremented `version`.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants