Skip to content

Add Pydantic schema exports and analyst service wrappers for GNAT-gui#159

Merged
wrhalpin merged 1 commit into
mainfrom
claude/create-gnat-admin-guide-BOSrp
Apr 25, 2026
Merged

Add Pydantic schema exports and analyst service wrappers for GNAT-gui#159
wrhalpin merged 1 commit into
mainfrom
claude/create-gnat-admin-guide-BOSrp

Conversation

@wrhalpin
Copy link
Copy Markdown
Owner

Implements Streams 1 and 2 from the GNAT-gui core changes plan, providing the foundation for the GNAT-gui web app (separate repo) to import gnat as a library with typed contracts.

Stream 1 — Pydantic schema exports (gnat/schemas/): 28 Pydantic v2 BaseModel schemas mirroring every domain dataclass with ConfigDict(from_attributes=True) and from_domain() classmethods. Covers analysis (investigations, hypotheses, confidence, timeline, graph, copilot), investigations (seeds, evidence graph), reporting (reports, findings, attribution), rules (audit entries), and auth (APIKey, OIDCIdentity). pydantic>=2.0 added to base dependencies.

Stream 2 — Analyst service wrappers (gnat/analyst_services/): Four thin orchestration services over existing domain code:

  • AnalysisService: investigations, hypotheses, timeline, graph, gaps
  • InvestigationsService: seed → build → graph summary
  • RulesService: list, evaluate, audit trail
  • ReportingService: create, transition, draft, STIX export All accept AnalystContext (actor, tenant, request_id) as first arg for audit attribution and multi-tenant scoping.

ADR-0057 (schemas) and ADR-0058 (services) document the decisions. 78 new tests, all passing. Zero regressions.

https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR

Implements Streams 1 and 2 from the GNAT-gui core changes plan,
providing the foundation for the GNAT-gui web app (separate repo)
to import gnat as a library with typed contracts.

Stream 1 — Pydantic schema exports (gnat/schemas/):
28 Pydantic v2 BaseModel schemas mirroring every domain dataclass
with ConfigDict(from_attributes=True) and from_domain() classmethods.
Covers analysis (investigations, hypotheses, confidence, timeline,
graph, copilot), investigations (seeds, evidence graph), reporting
(reports, findings, attribution), rules (audit entries), and auth
(APIKey, OIDCIdentity). pydantic>=2.0 added to base dependencies.

Stream 2 — Analyst service wrappers (gnat/analyst_services/):
Four thin orchestration services over existing domain code:
- AnalysisService: investigations, hypotheses, timeline, graph, gaps
- InvestigationsService: seed → build → graph summary
- RulesService: list, evaluate, audit trail
- ReportingService: create, transition, draft, STIX export
All accept AnalystContext (actor, tenant, request_id) as first arg
for audit attribution and multi-tenant scoping.

ADR-0057 (schemas) and ADR-0058 (services) document the decisions.
78 new tests, all passing. Zero regressions.

https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR
Copilot AI review requested due to automatic review settings April 25, 2026 14:43
@wrhalpin wrhalpin merged commit 456f610 into main Apr 25, 2026
16 of 23 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a typed, library-friendly surface to GNAT intended to be consumed by the external GNAT-gui web app: Pydantic v2 schemas for domain models plus “analyst service” orchestration wrappers over existing domain services.

Changes:

  • Introduces gnat/schemas/ (Pydantic v2 BaseModels) and exports them for downstream typed contracts.
  • Introduces gnat/analyst_services/ wrappers plus AnalystContext and a small exception hierarchy.
  • Adds unit tests for schemas + analyst services, updates ADR index + adds ADR-0057/0058, and adds pydantic as a base dependency.

Reviewed changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/unit/schemas/init.py Adds test package marker for schema unit tests.
tests/unit/schemas/test_rules.py Tests rule schema + audit entry schema serialization.
tests/unit/schemas/test_reporting.py Tests reporting schemas round-tripping from domain dataclasses.
tests/unit/schemas/test_investigations.py Tests investigations evidence-graph schemas from domain dataclasses.
tests/unit/schemas/test_analysis.py Tests analysis schemas (investigations, timeline, copilot, graph).
tests/unit/analyst_services/init.py Adds test package marker for analyst-services unit tests.
tests/unit/analyst_services/test_rules.py Unit tests for RulesService orchestration behavior.
tests/unit/analyst_services/test_reporting.py Unit tests for ReportingService orchestration behavior.
tests/unit/analyst_services/test_investigations.py Unit tests for InvestigationsService build + summary behavior.
tests/unit/analyst_services/test_analysis.py Unit tests for AnalysisService investigation/timeline/gaps/graph APIs.
pyproject.toml Adds Pydantic v2 as a base dependency.
gnat/schemas/rules/rule.py Defines RuleSchema and from_domain() constructor.
gnat/schemas/rules/audit.py Defines RuleAuditEntrySchema for rule firing audit dicts/objects.
gnat/schemas/rules/init.py Exports rule schemas.
gnat/schemas/reporting/report.py Defines reporting schemas including ReportSchema.
gnat/schemas/reporting/lifecycle.py Defines enum mirrors for report lifecycle types.
gnat/schemas/reporting/init.py Exports reporting schemas + lifecycle enums.
gnat/schemas/investigations/seed.py Defines SeedSchema.
gnat/schemas/investigations/graph.py Defines evidence graph schemas (node/edge/graph).
gnat/schemas/investigations/init.py Exports investigations schemas.
gnat/schemas/auth/identity.py Defines APIKey/OIDC identity schemas.
gnat/schemas/auth/init.py Exports auth schemas.
gnat/schemas/analysis/tlp.py Defines TLP enum schema mirror.
gnat/schemas/analysis/timeline.py Defines TimelineEventSchema.
gnat/schemas/analysis/investigation.py Defines investigation/hypothesis/note/task schemas.
gnat/schemas/analysis/graph.py Defines GraphContextSchema.
gnat/schemas/analysis/correlation.py Placeholder module for future correlation schemas.
gnat/schemas/analysis/copilot.py Defines gap recommendation + draft result schemas.
gnat/schemas/analysis/confidence.py Defines ConfidenceScoreSchema.
gnat/schemas/analysis/init.py Exports analysis schemas.
gnat/schemas/init.py Top-level schema export surface for API consumers.
gnat/analyst_services/context.py Adds AnalystContext request-scoped identity container.
gnat/analyst_services/exceptions.py Adds analyst-services exception hierarchy.
gnat/analyst_services/analysis.py Adds AnalysisService wrapper APIs for investigations/timeline/graph/gaps.
gnat/analyst_services/investigations.py Adds InvestigationsService wrapper around InvestigationBuilder.
gnat/analyst_services/rules.py Adds RulesService wrapper around rule loader/engine/audit writer.
gnat/analyst_services/reporting.py Adds ReportingService wrapper around ReportService + drafting/STIX export.
gnat/analyst_services/init.py Re-exports analyst-services public entry points.
docs/explanation/architecture/adrs/README.md Adds ADR links for schema exports + analyst services.
docs/explanation/architecture/adrs/0057-ADR-pydantic-schemas.md Documents schema-export decision and intended testing/round-trip guarantees.
docs/explanation/architecture/adrs/0058-ADR-analyst-services.md Documents analyst-services layer design and responsibilities.
CHANGELOG.md Documents new schema exports + analyst services additions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +206 to +227
### 8. Multi-tenant: AnalystContext.tenant flows through to all queries

Every analyst service method passes `ctx.tenant` to the underlying
domain service calls that support tenant filtering. This ensures
workspace isolation (ADR-0027) is applied consistently without
relying on each endpoint handler to remember to pass the tenant:

```python
def list(self, ctx: AnalystContext, filters: ListFilters) -> list[InvestigationSchema]:
investigations = self._investigation_svc.list(
tenant_id=ctx.tenant,
status=filters.status,
limit=filters.limit,
)
return [InvestigationSchema.from_domain(inv) for inv in investigations]
```

Domain services that do not yet accept `tenant_id` are updated to
accept and filter by it as part of this work. The domain service
changes are minimal (adding a `tenant_id: str | None = None`
parameter and a filter clause) and do not alter their public
contract for callers that do not pass a tenant.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR claims every analyst service method passes ctx.tenant through to domain services for tenant scoping, but the current service implementations in this PR don’t use ctx.tenant in any domain calls (it’s only logged). Either thread tenant IDs through where supported, or update this ADR section to avoid overstating current multi-tenant enforcement.

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +185
from gnat.analysis.investigations.models import Investigation
from gnat.analysis.tlp import TLPLevel

tlp = TLPLevel(classification) if classification else TLPLevel.AMBER
inv = Investigation(
title=title,
created_by=created_by or ctx.actor,
description=description,
classification=tlp,
tags=list(tags or []),
)
self._store.save(inv)
return InvestigationSchema.from_domain(inv)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisService is re-implementing core investigation business logic (create/transition/add_note/add_hypothesis) by mutating Investigation objects directly and calling store.save(). GNAT already has gnat.analysis.investigations.service.InvestigationService that owns these behaviors (including note formatting and transition rules), so duplicating this logic here risks drift and inconsistent behavior. Consider injecting/wrapping InvestigationService instead of using the store directly.

Copilot uses AI. Check for mistakes.
from gnat.analysis.investigations.models import InvestigationStatus

inv = self._get_investigation(investigation_id)
target = InvestigationStatus(new_status)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvestigationStatus(new_status) will raise ValueError for an unknown status string, but this method only documents/raises TransitionError. To keep the analyst-services error contract consistent for callers, catch invalid enum values and raise TransitionError (or a dedicated validation error) with a clear message.

Suggested change
target = InvestigationStatus(new_status)
try:
target = InvestigationStatus(new_status)
except ValueError as exc:
raise TransitionError(
f"Unknown investigation status {new_status!r}."
) from exc

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +155
### 4. Services accept and return Pydantic schemas

Analyst service methods accept Pydantic request schemas and return
Pydantic response schemas from `gnat/schemas/` (ADR-0057):

```python
class InvestigationsService:
def create(
self,
ctx: AnalystContext,
request: CreateInvestigationRequest,
) -> InvestigationSchema:
...
```

This provides:

- Input validation at the service boundary via Pydantic.
- Typed return values that FastAPI can serialize directly.
- A contract that is testable without HTTP.

Domain objects are converted to/from schemas inside the service
methods using `Schema.from_domain()` and `schema.to_domain()`.

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says analyst services accept Pydantic request schemas and convert via schema.to_domain(), but the implemented services currently take primitives / dict inputs (e.g., seeds: list[dict[str, Any]], new_status: str) and the schemas in this PR don’t implement to_domain(). The ADR should be updated (or the code adjusted) so the documented contract matches the actual API surface.

Copilot uses AI. Check for mistakes.
if confidence is not None:
from gnat.analysis.confidence import ConfidenceScore

hyp.confidence = ConfidenceScore(value=confidence)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfidenceScore does not accept a value argument (it requires source_reliability, information_credibility, and stix_confidence). As written, ConfidenceScore(value=confidence) will raise TypeError at runtime when confidence is provided. Consider changing this API to accept a ConfidenceScoreSchema (or accept an int stix_confidence and construct a ConfidenceScore properly).

Suggested change
hyp.confidence = ConfidenceScore(value=confidence)
hyp.confidence = ConfidenceScore(
source_reliability=None,
information_credibility=None,
stix_confidence=int(confidence),
)

Copilot uses AI. Check for mistakes.
Comment on lines +179 to +185
elif target == ReportStatus.PUBLISHED:
report = self._report_service.publish(report_id, changed_by=ctx.actor)
elif target == ReportStatus.ARCHIVED:
report = self._report_service.archive(report_id, changed_by=ctx.actor)
else:
report = self._report_service._transition(
report_id, target, ctx.actor, f"Transitioned to {new_status}."
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the internal ReportService._transition() for statuses other than REVIEW/PUBLISHED/ARCHIVED. That breaks domain semantics for at least approved and draft: ReportService.approve() also records the reviewer, and reject_to_draft() has dedicated behavior/messages. Routing approved/draft through _transition() will skip that logic and can leave report.reviewers incorrect.

Suggested change
elif target == ReportStatus.PUBLISHED:
report = self._report_service.publish(report_id, changed_by=ctx.actor)
elif target == ReportStatus.ARCHIVED:
report = self._report_service.archive(report_id, changed_by=ctx.actor)
else:
report = self._report_service._transition(
report_id, target, ctx.actor, f"Transitioned to {new_status}."
elif target == ReportStatus.APPROVED:
report = self._report_service.approve(report_id, reviewer=ctx.actor)
elif target == ReportStatus.DRAFT:
report = self._report_service.reject_to_draft(report_id, reviewer=ctx.actor)
elif target == ReportStatus.PUBLISHED:
report = self._report_service.publish(report_id, changed_by=ctx.actor)
elif target == ReportStatus.ARCHIVED:
report = self._report_service.archive(report_id, changed_by=ctx.actor)
else:
raise TransitionError(
f"Unsupported transition target {new_status!r}."

Copilot uses AI. Check for mistakes.
except ReportError as exc:
raise ReportNotFound(str(exc)) from exc

target = ReportStatus(new_status)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportStatus(new_status) will raise ValueError if new_status is not a valid enum value, but that error is not caught/mapped into TransitionError. If this is part of the public GUI/API contract, it would be better to catch invalid values and raise TransitionError (or a validation error) with an explicit list of allowed statuses.

Suggested change
target = ReportStatus(new_status)
try:
target = ReportStatus(new_status)
except ValueError as exc:
allowed_statuses = ", ".join(repr(status.value) for status in ReportStatus)
raise TransitionError(
f"Invalid report status {new_status!r}. Allowed statuses: {allowed_statuses}."
) from exc

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +75
A new `gnat/schemas/` package contains one module per domain area
(e.g. `investigations.py`, `indicators.py`, `reports.py`, `rules.py`).
Each module defines Pydantic v2 `BaseModel` subclasses that mirror the
corresponding domain dataclasses field-for-field.

```
gnat/schemas/
├── __init__.py
├── base.py # GNATSchema base class
├── indicators.py
├── investigations.py
├── reports.py
├── rules.py
├── campaigns.py
├── hypotheses.py
├── observables.py
└── common.py # Shared field types (TLPLevel, ConfidenceScore, etc.)
```
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ADR’s proposed gnat/schemas/ layout (single-file modules like investigations.py, reports.py, plus base.py/common.py) does not match the implementation in this PR (which uses subpackages like gnat/schemas/analysis/, gnat/schemas/reporting/, etc., and no base.py). The ADR should be updated to reflect the actual package/module structure so future contributors don’t follow an incorrect blueprint.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +173
A corresponding `to_domain()` instance method reconstructs the domain
object from the schema, enabling the full round trip.

### 4. Pydantic added to base dependencies

Pydantic v2 (`pydantic>=2.0,<3`) is promoted from an indirect
dependency (via FastAPI in `gnat[serve]`) to a direct base dependency
in `pyproject.toml`. This means all GNAT installations — including
CLI-only and library-only uses — can import `gnat.schemas`.

Rationale: schemas are the typed contract for all API consumers, not
just the HTTP layer. The CLI, TUI, addon tools, and agent layer all
benefit from validated input/output. Pydantic v2 is pure Python with
a Rust-accelerated core (`pydantic-core`), has minimal transitive
dependencies, and is already present in practice for most users.

### 5. Schemas are the typed contract for API consumers

FastAPI endpoint signatures use schema classes as request bodies and
response models:

```python
@router.post("/investigations", response_model=InvestigationSchema)
async def create_investigation(body: CreateInvestigationRequest, ...):
...
```

FastAPI auto-generates an OpenAPI 3.1 spec from these annotations.
The frontend build pipeline runs `openapi-typescript` against the spec
to produce TypeScript type definitions, closing the type safety chain
from database to browser.

### 6. Domain dataclasses remain the source of truth

The domain layer (`gnat/analysis/`, `gnat/orm/`, `gnat/research/`,
etc.) continues to use plain Python dataclasses and the property-bag
ORM. No domain code imports from `gnat.schemas`. The dependency
arrow is strictly one-directional:

```
gnat.schemas --> gnat.analysis / gnat.orm / gnat.research
```

If a domain dataclass gains a new field, the corresponding schema must
be updated. This is enforced by round-trip tests (see next decision).

### 7. Round-trip tests verify parity

A dedicated test module `tests/unit/schemas/test_round_trip.py`
verifies that every schema/domain pair survives the full round trip:

```
domain_obj --> Schema.from_domain(domain_obj) --> .model_dump(mode="json")
--> Schema.model_validate_json(json_bytes) --> .to_domain()
--> assert equal to original domain_obj
```
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR states that schemas implement to_domain() and that parity is enforced by tests/unit/schemas/test_round_trip.py, but the current implementation only provides from_domain() and the tests added are per-domain (no test_round_trip.py). Either implement the to_domain() + dedicated round-trip parity tests as described, or adjust the ADR to match what’s actually shipped.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants