Add Pydantic schema exports and analyst service wrappers for GNAT-gui#159
Conversation
Implements Streams 1 and 2 from the GNAT-gui core changes plan, providing the foundation for the GNAT-gui web app (separate repo) to import gnat as a library with typed contracts. Stream 1 — Pydantic schema exports (gnat/schemas/): 28 Pydantic v2 BaseModel schemas mirroring every domain dataclass with ConfigDict(from_attributes=True) and from_domain() classmethods. Covers analysis (investigations, hypotheses, confidence, timeline, graph, copilot), investigations (seeds, evidence graph), reporting (reports, findings, attribution), rules (audit entries), and auth (APIKey, OIDCIdentity). pydantic>=2.0 added to base dependencies. Stream 2 — Analyst service wrappers (gnat/analyst_services/): Four thin orchestration services over existing domain code: - AnalysisService: investigations, hypotheses, timeline, graph, gaps - InvestigationsService: seed → build → graph summary - RulesService: list, evaluate, audit trail - ReportingService: create, transition, draft, STIX export All accept AnalystContext (actor, tenant, request_id) as first arg for audit attribution and multi-tenant scoping. ADR-0057 (schemas) and ADR-0058 (services) document the decisions. 78 new tests, all passing. Zero regressions. https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR
There was a problem hiding this comment.
Pull request overview
Adds a typed, library-friendly surface to GNAT intended to be consumed by the external GNAT-gui web app: Pydantic v2 schemas for domain models plus “analyst service” orchestration wrappers over existing domain services.
Changes:
- Introduces
gnat/schemas/(Pydantic v2BaseModels) and exports them for downstream typed contracts. - Introduces
gnat/analyst_services/wrappers plusAnalystContextand a small exception hierarchy. - Adds unit tests for schemas + analyst services, updates ADR index + adds ADR-0057/0058, and adds
pydanticas a base dependency.
Reviewed changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/schemas/init.py | Adds test package marker for schema unit tests. |
| tests/unit/schemas/test_rules.py | Tests rule schema + audit entry schema serialization. |
| tests/unit/schemas/test_reporting.py | Tests reporting schemas round-tripping from domain dataclasses. |
| tests/unit/schemas/test_investigations.py | Tests investigations evidence-graph schemas from domain dataclasses. |
| tests/unit/schemas/test_analysis.py | Tests analysis schemas (investigations, timeline, copilot, graph). |
| tests/unit/analyst_services/init.py | Adds test package marker for analyst-services unit tests. |
| tests/unit/analyst_services/test_rules.py | Unit tests for RulesService orchestration behavior. |
| tests/unit/analyst_services/test_reporting.py | Unit tests for ReportingService orchestration behavior. |
| tests/unit/analyst_services/test_investigations.py | Unit tests for InvestigationsService build + summary behavior. |
| tests/unit/analyst_services/test_analysis.py | Unit tests for AnalysisService investigation/timeline/gaps/graph APIs. |
| pyproject.toml | Adds Pydantic v2 as a base dependency. |
| gnat/schemas/rules/rule.py | Defines RuleSchema and from_domain() constructor. |
| gnat/schemas/rules/audit.py | Defines RuleAuditEntrySchema for rule firing audit dicts/objects. |
| gnat/schemas/rules/init.py | Exports rule schemas. |
| gnat/schemas/reporting/report.py | Defines reporting schemas including ReportSchema. |
| gnat/schemas/reporting/lifecycle.py | Defines enum mirrors for report lifecycle types. |
| gnat/schemas/reporting/init.py | Exports reporting schemas + lifecycle enums. |
| gnat/schemas/investigations/seed.py | Defines SeedSchema. |
| gnat/schemas/investigations/graph.py | Defines evidence graph schemas (node/edge/graph). |
| gnat/schemas/investigations/init.py | Exports investigations schemas. |
| gnat/schemas/auth/identity.py | Defines APIKey/OIDC identity schemas. |
| gnat/schemas/auth/init.py | Exports auth schemas. |
| gnat/schemas/analysis/tlp.py | Defines TLP enum schema mirror. |
| gnat/schemas/analysis/timeline.py | Defines TimelineEventSchema. |
| gnat/schemas/analysis/investigation.py | Defines investigation/hypothesis/note/task schemas. |
| gnat/schemas/analysis/graph.py | Defines GraphContextSchema. |
| gnat/schemas/analysis/correlation.py | Placeholder module for future correlation schemas. |
| gnat/schemas/analysis/copilot.py | Defines gap recommendation + draft result schemas. |
| gnat/schemas/analysis/confidence.py | Defines ConfidenceScoreSchema. |
| gnat/schemas/analysis/init.py | Exports analysis schemas. |
| gnat/schemas/init.py | Top-level schema export surface for API consumers. |
| gnat/analyst_services/context.py | Adds AnalystContext request-scoped identity container. |
| gnat/analyst_services/exceptions.py | Adds analyst-services exception hierarchy. |
| gnat/analyst_services/analysis.py | Adds AnalysisService wrapper APIs for investigations/timeline/graph/gaps. |
| gnat/analyst_services/investigations.py | Adds InvestigationsService wrapper around InvestigationBuilder. |
| gnat/analyst_services/rules.py | Adds RulesService wrapper around rule loader/engine/audit writer. |
| gnat/analyst_services/reporting.py | Adds ReportingService wrapper around ReportService + drafting/STIX export. |
| gnat/analyst_services/init.py | Re-exports analyst-services public entry points. |
| docs/explanation/architecture/adrs/README.md | Adds ADR links for schema exports + analyst services. |
| docs/explanation/architecture/adrs/0057-ADR-pydantic-schemas.md | Documents schema-export decision and intended testing/round-trip guarantees. |
| docs/explanation/architecture/adrs/0058-ADR-analyst-services.md | Documents analyst-services layer design and responsibilities. |
| CHANGELOG.md | Documents new schema exports + analyst services additions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### 8. Multi-tenant: AnalystContext.tenant flows through to all queries | ||
|
|
||
| Every analyst service method passes `ctx.tenant` to the underlying | ||
| domain service calls that support tenant filtering. This ensures | ||
| workspace isolation (ADR-0027) is applied consistently without | ||
| relying on each endpoint handler to remember to pass the tenant: | ||
|
|
||
| ```python | ||
| def list(self, ctx: AnalystContext, filters: ListFilters) -> list[InvestigationSchema]: | ||
| investigations = self._investigation_svc.list( | ||
| tenant_id=ctx.tenant, | ||
| status=filters.status, | ||
| limit=filters.limit, | ||
| ) | ||
| return [InvestigationSchema.from_domain(inv) for inv in investigations] | ||
| ``` | ||
|
|
||
| Domain services that do not yet accept `tenant_id` are updated to | ||
| accept and filter by it as part of this work. The domain service | ||
| changes are minimal (adding a `tenant_id: str | None = None` | ||
| parameter and a filter clause) and do not alter their public | ||
| contract for callers that do not pass a tenant. |
There was a problem hiding this comment.
The ADR claims every analyst service method passes ctx.tenant through to domain services for tenant scoping, but the current service implementations in this PR don’t use ctx.tenant in any domain calls (it’s only logged). Either thread tenant IDs through where supported, or update this ADR section to avoid overstating current multi-tenant enforcement.
| from gnat.analysis.investigations.models import Investigation | ||
| from gnat.analysis.tlp import TLPLevel | ||
|
|
||
| tlp = TLPLevel(classification) if classification else TLPLevel.AMBER | ||
| inv = Investigation( | ||
| title=title, | ||
| created_by=created_by or ctx.actor, | ||
| description=description, | ||
| classification=tlp, | ||
| tags=list(tags or []), | ||
| ) | ||
| self._store.save(inv) | ||
| return InvestigationSchema.from_domain(inv) |
There was a problem hiding this comment.
AnalysisService is re-implementing core investigation business logic (create/transition/add_note/add_hypothesis) by mutating Investigation objects directly and calling store.save(). GNAT already has gnat.analysis.investigations.service.InvestigationService that owns these behaviors (including note formatting and transition rules), so duplicating this logic here risks drift and inconsistent behavior. Consider injecting/wrapping InvestigationService instead of using the store directly.
| from gnat.analysis.investigations.models import InvestigationStatus | ||
|
|
||
| inv = self._get_investigation(investigation_id) | ||
| target = InvestigationStatus(new_status) |
There was a problem hiding this comment.
InvestigationStatus(new_status) will raise ValueError for an unknown status string, but this method only documents/raises TransitionError. To keep the analyst-services error contract consistent for callers, catch invalid enum values and raise TransitionError (or a dedicated validation error) with a clear message.
| target = InvestigationStatus(new_status) | |
| try: | |
| target = InvestigationStatus(new_status) | |
| except ValueError as exc: | |
| raise TransitionError( | |
| f"Unknown investigation status {new_status!r}." | |
| ) from exc |
| ### 4. Services accept and return Pydantic schemas | ||
|
|
||
| Analyst service methods accept Pydantic request schemas and return | ||
| Pydantic response schemas from `gnat/schemas/` (ADR-0057): | ||
|
|
||
| ```python | ||
| class InvestigationsService: | ||
| def create( | ||
| self, | ||
| ctx: AnalystContext, | ||
| request: CreateInvestigationRequest, | ||
| ) -> InvestigationSchema: | ||
| ... | ||
| ``` | ||
|
|
||
| This provides: | ||
|
|
||
| - Input validation at the service boundary via Pydantic. | ||
| - Typed return values that FastAPI can serialize directly. | ||
| - A contract that is testable without HTTP. | ||
|
|
||
| Domain objects are converted to/from schemas inside the service | ||
| methods using `Schema.from_domain()` and `schema.to_domain()`. | ||
|
|
There was a problem hiding this comment.
This section says analyst services accept Pydantic request schemas and convert via schema.to_domain(), but the implemented services currently take primitives / dict inputs (e.g., seeds: list[dict[str, Any]], new_status: str) and the schemas in this PR don’t implement to_domain(). The ADR should be updated (or the code adjusted) so the documented contract matches the actual API surface.
| if confidence is not None: | ||
| from gnat.analysis.confidence import ConfidenceScore | ||
|
|
||
| hyp.confidence = ConfidenceScore(value=confidence) |
There was a problem hiding this comment.
ConfidenceScore does not accept a value argument (it requires source_reliability, information_credibility, and stix_confidence). As written, ConfidenceScore(value=confidence) will raise TypeError at runtime when confidence is provided. Consider changing this API to accept a ConfidenceScoreSchema (or accept an int stix_confidence and construct a ConfidenceScore properly).
| hyp.confidence = ConfidenceScore(value=confidence) | |
| hyp.confidence = ConfidenceScore( | |
| source_reliability=None, | |
| information_credibility=None, | |
| stix_confidence=int(confidence), | |
| ) |
| elif target == ReportStatus.PUBLISHED: | ||
| report = self._report_service.publish(report_id, changed_by=ctx.actor) | ||
| elif target == ReportStatus.ARCHIVED: | ||
| report = self._report_service.archive(report_id, changed_by=ctx.actor) | ||
| else: | ||
| report = self._report_service._transition( | ||
| report_id, target, ctx.actor, f"Transitioned to {new_status}." |
There was a problem hiding this comment.
This uses the internal ReportService._transition() for statuses other than REVIEW/PUBLISHED/ARCHIVED. That breaks domain semantics for at least approved and draft: ReportService.approve() also records the reviewer, and reject_to_draft() has dedicated behavior/messages. Routing approved/draft through _transition() will skip that logic and can leave report.reviewers incorrect.
| elif target == ReportStatus.PUBLISHED: | |
| report = self._report_service.publish(report_id, changed_by=ctx.actor) | |
| elif target == ReportStatus.ARCHIVED: | |
| report = self._report_service.archive(report_id, changed_by=ctx.actor) | |
| else: | |
| report = self._report_service._transition( | |
| report_id, target, ctx.actor, f"Transitioned to {new_status}." | |
| elif target == ReportStatus.APPROVED: | |
| report = self._report_service.approve(report_id, reviewer=ctx.actor) | |
| elif target == ReportStatus.DRAFT: | |
| report = self._report_service.reject_to_draft(report_id, reviewer=ctx.actor) | |
| elif target == ReportStatus.PUBLISHED: | |
| report = self._report_service.publish(report_id, changed_by=ctx.actor) | |
| elif target == ReportStatus.ARCHIVED: | |
| report = self._report_service.archive(report_id, changed_by=ctx.actor) | |
| else: | |
| raise TransitionError( | |
| f"Unsupported transition target {new_status!r}." |
| except ReportError as exc: | ||
| raise ReportNotFound(str(exc)) from exc | ||
|
|
||
| target = ReportStatus(new_status) |
There was a problem hiding this comment.
ReportStatus(new_status) will raise ValueError if new_status is not a valid enum value, but that error is not caught/mapped into TransitionError. If this is part of the public GUI/API contract, it would be better to catch invalid values and raise TransitionError (or a validation error) with an explicit list of allowed statuses.
| target = ReportStatus(new_status) | |
| try: | |
| target = ReportStatus(new_status) | |
| except ValueError as exc: | |
| allowed_statuses = ", ".join(repr(status.value) for status in ReportStatus) | |
| raise TransitionError( | |
| f"Invalid report status {new_status!r}. Allowed statuses: {allowed_statuses}." | |
| ) from exc |
| A new `gnat/schemas/` package contains one module per domain area | ||
| (e.g. `investigations.py`, `indicators.py`, `reports.py`, `rules.py`). | ||
| Each module defines Pydantic v2 `BaseModel` subclasses that mirror the | ||
| corresponding domain dataclasses field-for-field. | ||
|
|
||
| ``` | ||
| gnat/schemas/ | ||
| ├── __init__.py | ||
| ├── base.py # GNATSchema base class | ||
| ├── indicators.py | ||
| ├── investigations.py | ||
| ├── reports.py | ||
| ├── rules.py | ||
| ├── campaigns.py | ||
| ├── hypotheses.py | ||
| ├── observables.py | ||
| └── common.py # Shared field types (TLPLevel, ConfidenceScore, etc.) | ||
| ``` |
There was a problem hiding this comment.
This ADR’s proposed gnat/schemas/ layout (single-file modules like investigations.py, reports.py, plus base.py/common.py) does not match the implementation in this PR (which uses subpackages like gnat/schemas/analysis/, gnat/schemas/reporting/, etc., and no base.py). The ADR should be updated to reflect the actual package/module structure so future contributors don’t follow an incorrect blueprint.
| A corresponding `to_domain()` instance method reconstructs the domain | ||
| object from the schema, enabling the full round trip. | ||
|
|
||
| ### 4. Pydantic added to base dependencies | ||
|
|
||
| Pydantic v2 (`pydantic>=2.0,<3`) is promoted from an indirect | ||
| dependency (via FastAPI in `gnat[serve]`) to a direct base dependency | ||
| in `pyproject.toml`. This means all GNAT installations — including | ||
| CLI-only and library-only uses — can import `gnat.schemas`. | ||
|
|
||
| Rationale: schemas are the typed contract for all API consumers, not | ||
| just the HTTP layer. The CLI, TUI, addon tools, and agent layer all | ||
| benefit from validated input/output. Pydantic v2 is pure Python with | ||
| a Rust-accelerated core (`pydantic-core`), has minimal transitive | ||
| dependencies, and is already present in practice for most users. | ||
|
|
||
| ### 5. Schemas are the typed contract for API consumers | ||
|
|
||
| FastAPI endpoint signatures use schema classes as request bodies and | ||
| response models: | ||
|
|
||
| ```python | ||
| @router.post("/investigations", response_model=InvestigationSchema) | ||
| async def create_investigation(body: CreateInvestigationRequest, ...): | ||
| ... | ||
| ``` | ||
|
|
||
| FastAPI auto-generates an OpenAPI 3.1 spec from these annotations. | ||
| The frontend build pipeline runs `openapi-typescript` against the spec | ||
| to produce TypeScript type definitions, closing the type safety chain | ||
| from database to browser. | ||
|
|
||
| ### 6. Domain dataclasses remain the source of truth | ||
|
|
||
| The domain layer (`gnat/analysis/`, `gnat/orm/`, `gnat/research/`, | ||
| etc.) continues to use plain Python dataclasses and the property-bag | ||
| ORM. No domain code imports from `gnat.schemas`. The dependency | ||
| arrow is strictly one-directional: | ||
|
|
||
| ``` | ||
| gnat.schemas --> gnat.analysis / gnat.orm / gnat.research | ||
| ``` | ||
|
|
||
| If a domain dataclass gains a new field, the corresponding schema must | ||
| be updated. This is enforced by round-trip tests (see next decision). | ||
|
|
||
| ### 7. Round-trip tests verify parity | ||
|
|
||
| A dedicated test module `tests/unit/schemas/test_round_trip.py` | ||
| verifies that every schema/domain pair survives the full round trip: | ||
|
|
||
| ``` | ||
| domain_obj --> Schema.from_domain(domain_obj) --> .model_dump(mode="json") | ||
| --> Schema.model_validate_json(json_bytes) --> .to_domain() | ||
| --> assert equal to original domain_obj | ||
| ``` |
There was a problem hiding this comment.
The ADR states that schemas implement to_domain() and that parity is enforced by tests/unit/schemas/test_round_trip.py, but the current implementation only provides from_domain() and the tests added are per-domain (no test_round_trip.py). Either implement the to_domain() + dedicated round-trip parity tests as described, or adjust the ADR to match what’s actually shipped.
Implements Streams 1 and 2 from the GNAT-gui core changes plan, providing the foundation for the GNAT-gui web app (separate repo) to import gnat as a library with typed contracts.
Stream 1 — Pydantic schema exports (gnat/schemas/): 28 Pydantic v2 BaseModel schemas mirroring every domain dataclass with ConfigDict(from_attributes=True) and from_domain() classmethods. Covers analysis (investigations, hypotheses, confidence, timeline, graph, copilot), investigations (seeds, evidence graph), reporting (reports, findings, attribution), rules (audit entries), and auth (APIKey, OIDCIdentity). pydantic>=2.0 added to base dependencies.
Stream 2 — Analyst service wrappers (gnat/analyst_services/): Four thin orchestration services over existing domain code:
ADR-0057 (schemas) and ADR-0058 (services) document the decisions. 78 new tests, all passing. Zero regressions.
https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR