Fix Python from_dict() round-trip for optional fields with schema defaults#1313
Fix Python from_dict() round-trip for optional fields with schema defaults#1313stephentoub wants to merge 2 commits into
Conversation
…aults Fixes #1139, #1140, #1141. The Python codegen was embedding JSON-Schema `default` values into `obj.get(key, default)` for optional fields. But the generated dataclass field is always `T | None = None` and `to_dict()` omits the field when `None`, so `from_dict(to_dict(x))` silently mutated unset fields into the schema default: - `SessionTaskCompleteData.summary`: `None` -> `""` - `PermissionPromptRequest.action`: `None` -> `MemoryAction.STORE` - `PermissionRequest.action`: `None` -> `MemoryAction.STORE` Drop `defaultLiteral` from both `emitPyClass` and `emitPyFlatDiscriminatedUnion` so `from_dict()` always uses `obj.get(key)` (matching the dataclass default). Regenerate `session_events.py`. Flip the two codegen-level test assertions that previously locked in the buggy output and add negative assertions. Replace `test_schema_defaults_are_applied_for_missing_optional_fields` (which asserted the bug as expected behavior) with regression tests covering missing-key parsing, explicit-null parsing, and full `from_dict(to_dict(x))` round-trips for all three affected classes. Other languages (Go, .NET, Rust, TypeScript) are unaffected; their generators never read `propSchema.default` for deserialization fallbacks. Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
Fixes Python generated event-model deserialization so that missing optional fields no longer silently substitute JSON-Schema default values, restoring from_dict(to_dict(x)) identity when those fields are None (addressing #1139, #1140, #1141).
Changes:
- Updated Python codegen to stop emitting
obj.get(key, <schema default>)fallbacks duringfrom_dict()generation and always useobj.get(key). - Regenerated
python/copilot/generated/session_events.pyto remove the three problematic defaultedobj.get(...)call sites. - Updated generator-level (Node) and Python regression tests to lock in correct missing-key / explicit-null / round-trip behavior.
Show a summary per file
| File | Description |
|---|---|
| scripts/codegen/python.ts | Removes schema-default fallback emission in from_dict() generation (both class and flat discriminated-union paths). |
| python/copilot/generated/session_events.py | Regenerated output removing defaulted obj.get(..., default) for the affected optional fields. |
| nodejs/test/python-codegen.test.ts | Adjusts assertions to expect obj.get(key) and adds negative assertions preventing regression. |
| python/test_event_forward_compatibility.py | Replaces the prior test that encoded the buggy behavior with regression tests for missing/null parsing and from_dict(to_dict(x)) round-trip. |
Copilot's findings
- Files reviewed: 3/4 changed files
- Comments generated: 0
|
@brettcannon would you mind looking at this? Is it the right fix? |
The CI prettier check failed on test/python-codegen.test.ts after the assertion update. Apply prettier --write to bring the file back into compliance. Co-authored-by: Copilot <[email protected]>
Cross-SDK Consistency Review ✅This PR fixes a Python-specific codegen bug and maintains cross-SDK consistency. Analysis: The changes touch only:
Cross-language impact: None needed. The PR description correctly notes that Go, .NET, Rust, and TypeScript generators were never affected — they use real nullable types end-to-end and don't synthesize default fallbacks during deserialization. I verified that no other codegen script in Consistency result: The fix actually brings Python into better alignment with the other SDKs — missing optional fields now uniformly return No changes needed in other SDK implementations.
|
Fixes #1139.
Fixes #1140.
Fixes #1141.
Why
from_dict(to_dict(x))was not the identity for three generated Python dataclasses whenever the affected optional field wasNone. The dataclass declares the field asT | None = Noneandto_dict()omits it whenNone, butfrom_dict()was substituting the JSON-Schemadefaultvalue when the key was missing, silently mutating unset fields:SessionTaskCompleteData.summary:None→""PermissionPromptRequest.action:None→PermissionPromptRequestMemoryAction.STOREPermissionRequest.action:None→PermissionRequestMemoryAction.STOREAny code that round-trips events (caching, replay, audit logs, dedup) or pattern-matches on
Noneto mean "absent" was getting the wrong answer end-to-end. The bug was reproducible onmainas of today.What changed
Root cause is in
scripts/codegen/python.ts: for optional fields with a schemadefault, the codegen was emittingobj.get("key", "<default>"), then passing that intofrom_union([from_none, parse_X], ...), which happily parsed the default and returned it. The dataclass default and the wire default were inconsistent, so the round-trip broke.defaultLiteralmechanism from bothemitPyClassandemitPyFlatDiscriminatedUnion.from_dict()now uniformly emitsobj.get(key), so missing/null keys flow throughfrom_noneand land at the dataclass-declaredNone. Removed the now-unusedtoPythonLiteralhelper.python/copilot/generated/session_events.pyvianpm run generate:python. Exactly three call sites changed, all matching the bug reports.nodejs/test/python-codegen.test.tsthat previously locked in the buggy output, and added negative assertions to prevent regression at the generator level.test_schema_defaults_are_applied_for_missing_optional_fieldsinpython/test_event_forward_compatibility.py(which asserted the bug as expected behavior) with two regression tests covering missing-key parsing, explicit-null parsing, and the fullfrom_dict(to_dict(x))round-trip for all three affected classes.Surface area / non-obvious notes
propSchema.defaultto construct deserialization fallbacks; they use real nullable types end-to-end.session_events.pyandrpc.pyconfirms zero remainingobj.get("...", "...")patterns, so no other class was silently relying on baked-in schema defaults.test_schema_defaults_are_applied_for_missing_optional_fieldstest was the only consumer in the repo of the old behavior. Nothing in runtime code depended onrequest.action == STOREortask_complete.summary == ""for unset fields.defaultis an annotation, not validation behavior, so applying it only during deserialization while ignoring it in the dataclass field and into_dict()was internally inconsistent. This change makes Python deserialization symmetric with serialization and with the in-memory default.Validation
python -m pytest test_event_forward_compatibility.py-> 10/10 pass (including new regression tests).npx vitest runonpython-codegen.test.tsand other non-CLI test files: pass.ruff check,ruff format --check,ty check copilot,tsc --noEmit,eslint: clean.main(changes stashed) confirms the bug; with the fix applied, all three round-trip withequal? True.Independent review by the
code-reviewagent onclaude-opus-4.7-xhighfound no significant issues and confirmed both codegen paths and the test coverage.