Skip to content

Decouple conversation god object: split, summary view, migrate callers (#6423)#6425

Merged
beastoin merged 6 commits intomainfrom
refactor/decouple-conversation-model-6423
Apr 9, 2026
Merged

Decouple conversation god object: split, summary view, migrate callers (#6423)#6425
beastoin merged 6 commits intomainfrom
refactor/decouple-conversation-model-6423

Conversation

@beastoin
Copy link
Copy Markdown
Collaborator

@beastoin beastoin commented Apr 8, 2026

Summary

Decouples backend/models/conversation.py — the highest-coupling file in the backend (570 lines, 20+ models, 312 edges, 29+ importers) — into domain-specific modules. All 3 phases in one PR.

Phase 1 — Split the file (7 new modules)

  • conversation_enums.py — 7 enums (CategoryEnum, ConversationSource, ConversationStatus, etc.)
  • structured.py — Structured, ActionItem, Event, ActionItemsExtraction
  • geolocation.py — Geolocation
  • audio_file.py — AudioFile
  • conversation_photo.py — ConversationPhoto
  • calendar_context.py — CalendarMeetingContext, MeetingParticipant
  • conversation.py simplified from 570→320 lines (keeps Conversation + CRUD/request/response models)
  • Re-exports preserved for backward compatibility

Phase 2 — ConversationSummary lightweight view

  • New conversation_summary.py with read-only view (id, title, overview, category, transcript_text, created_at, person_ids)
  • from_conversation() classmethod for easy conversion

Phase 3 — Migrate all callers to narrow imports

  • 34 files migrated from from models.conversation import X to canonical modules
  • Wildcard imports eliminated in process_conversation.py and postprocess_conversation.py
  • __all__ added to conversation.py to control star-import surface

Changes

Category Files Details
New modules 7 conversation_enums, structured, geolocation, audio_file, conversation_photo, calendar_context, conversation_summary
Core refactor 1 conversation.py (simplified + all + re-exports)
Migrated callers 30 routers/, database/, utils/, models/
Wildcard→explicit 2 process_conversation.py, postprocess_conversation.py
Tests 1 test_conversation_model_split.py (38 tests)

Test plan

  • 38 unit tests: backward compat, identity preservation, serialization round-trips, helper methods, ConversationSummary, all control, init side effects, get_person_ids, as_dict_cleaned_dates
  • All existing tests pass (0 new failures)
  • Codex review: approved (no circular imports, no missing migrations, all complete)
  • Codex tester: approved (38/38 pass, all requested boundary tests added)
  • L1 Backend: uvicorn main:app — 318 routes, curl tested conversation endpoints (200s)
  • L1 Pusher: uvicorn pusher.main:app — healthy, WS /v1/trigger/listen connected
  • E2E: 5-min audio (3000 chunks) streamed to /v4/listen — full pipeline ran, zero import errors

Deployment plan

Deploy order: Cloud Run backend first, then GKE pusher. Pure import refactor — main failure mode is a missed import, which would surface immediately as ImportError / 500s.

Step 1 — Deploy Backend to Cloud Run (prod)

gh workflow run "Deploy Backend to Cloud RUN" --repo BasedHardware/omi -f environment=prod -f branch=main

Wait ~8-10 min, then verify:

# Check new revision is serving
gcloud run services describe backend --region us-central1 --project based-hardware \
  --format="value(status.traffic[0].revisionName)"

# Check for 5xx spike in first few minutes
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="backend" AND httpRequest.status>=500 AND timestamp>="<DEPLOY_TIME>"' \
  --project=based-hardware --limit=10 \
  --format='table(timestamp,httpRequest.status,httpRequest.requestUrl)'

Step 2 — Deploy Pusher to GKE (prod)

gh workflow run "Deploy Backend Pusher to GKE" --repo BasedHardware/omi --ref main -f environment=prod

Wait ~5-7 min, then verify:

# Check rollout
kubectl rollout status deployment/prod-omi-pusher -n prod-omi-backend --timeout=300s

# Verify pods are healthy
kubectl get pods -n prod-omi-backend -l app=prod-omi-pusher --no-headers

Step 3 — Post-deploy verification

# Pusher pods running count
kubectl get pods -n prod-omi-backend -l app=prod-omi-pusher \
  --field-selector=status.phase=Running --no-headers | wc -l

# Monitor Cloud Run 5xx rate for first 10 min (should be zero or baseline)

Rollback (if needed)

# Cloud Run
gcloud run services update-traffic backend --to-revisions=<PREVIOUS_REVISION>=100 \
  --region us-central1 --project based-hardware

# GKE
kubectl rollout undo deployment/prod-omi-pusher -n prod-omi-backend

Closes #6423

🤖 Generated with Claude Code

…6423)

Move 20+ models from the 570-line models/conversation.py into 6 focused
files while maintaining full backward compatibility via re-exports:

- conversation_enums.py: CategoryEnum, ConversationSource, ConversationStatus,
  ConversationVisibility, PostProcessingStatus, PostProcessingModel,
  ExternalIntegrationConversationSource
- structured.py: ActionItem, Event, ActionItemsExtraction, Structured
- geolocation.py: Geolocation
- audio_file.py: AudioFile
- conversation_photo.py: ConversationPhoto
- calendar_context.py: MeetingParticipant, CalendarMeetingContext

All 50+ existing importers continue working unchanged via re-exports in
conversation.py. No Firestore schema changes, no API changes.

Closes #6423

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR splits backend/models/conversation.py (570 lines) into 6 focused domain modules (conversation_enums.py, structured.py, geolocation.py, audio_file.py, conversation_photo.py, calendar_context.py), with re-exports in conversation.py preserving all existing import paths. It is a pure structural refactoring — no Firestore schema, API, or caller changes — and the new test file is correctly registered in test.sh.

Confidence Score: 5/5

Safe to merge — pure structural refactoring with no behaviour, schema, or API changes.

All moved symbols are re-exported via __all__, class identity is preserved (same object across both import paths), no circular imports, the new test file covers backward compatibility exhaustively and is registered in test.sh. No P0 or P1 findings.

No files require special attention.

Vulnerabilities

No security concerns identified. This is a pure structural refactoring of Pydantic model definitions with no changes to authentication, input handling, data storage, or secrets management.

Important Files Changed

Filename Overview
backend/models/conversation.py Slimmed to core models only; imports all moved symbols and re-exports them via __all__ for full backward compatibility — no functional changes.
backend/models/conversation_enums.py New module containing all 7 enums extracted verbatim from the original file; no logic changes, ConversationSource._missing_ hook preserved.
backend/models/structured.py New module with ActionItem, Event, ActionItemsExtraction, Structured; imports CategoryEnum from conversation_enums — no circular dependency risk.
backend/models/audio_file.py Straightforward extraction of AudioFile with stdlib-only imports; no changes to fields or behaviour.
backend/models/calendar_context.py New module with MeetingParticipant and CalendarMeetingContext; clean extraction with no cross-module dependencies beyond stdlib.
backend/models/conversation_photo.py Clean extraction of ConversationPhoto including its photos_as_string static method; no behaviour changes.
backend/models/geolocation.py Minimal Geolocation model extracted cleanly with no dependencies beyond Pydantic.
backend/tests/unit/test_conversation_model_split.py 20 unit tests covering backward-compatible imports, direct new-module imports, serialisation round-trips, and helper methods; correctly registered in test.sh.
backend/test.sh New test file added at line 45 per the CLAUDE.md requirement that every new test file be explicitly listed in test.sh.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class conversation_enums {
        CategoryEnum
        ConversationSource
        ConversationStatus
        ConversationVisibility
        PostProcessingStatus
        PostProcessingModel
        ExternalIntegrationConversationSource
    }
    class structured {
        ActionItem
        Event
        ActionItemsExtraction
        Structured
    }
    class geolocation {
        Geolocation
    }
    class audio_file {
        AudioFile
    }
    class conversation_photo {
        ConversationPhoto
    }
    class calendar_context {
        MeetingParticipant
        CalendarMeetingContext
    }
    class conversation {
        Conversation
        CreateConversation
        ExternalIntegrationCreateConversation
        ConversationPostProcessing
        PluginResult
        AppResult
        __all__ (re-exports all)
    }

    structured --> conversation_enums : imports CategoryEnum
    conversation --> conversation_enums : imports + re-exports
    conversation --> structured : imports + re-exports
    conversation --> geolocation : imports + re-exports
    conversation --> audio_file : imports + re-exports
    conversation --> conversation_photo : imports + re-exports
    conversation --> calendar_context : imports + re-exports
Loading

Reviews (1): Last reviewed commit: "Phase 1: Split conversation god object i..." | Re-trigger Greptile

beastoin and others added 5 commits April 8, 2026 04:48
… __all__

Codex review caught that __all__ excluded passthrough imports that star-import
consumers (postprocess_conversation.py, process_conversation.py) depend on.
Added TranscriptSegment, Message, Person to __all__ and a regression test.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Codex review round 2 found that __all__ also excluded typing symbols
(List, Optional) that postprocess_conversation.py gets via star import.
Rather than enumerate every leaked name, remove __all__ entirely — the
explicit re-imports at module level already handle named imports. Phase 3
will clean up the star imports and add __all__ back.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…desc photos

Addresses tester findings:
- Add include_timestamps=True branches for ActionItem and ConversationPhoto
- Add events_to_string([]) empty case
- Add photos with no description case
- Remove brittle len(CategoryEnum)==33 assertion
- Document why real-consumer smoke import is not feasible (GCP creds)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…orts (#6423)

Phase 2: New ConversationSummary lightweight view model for consumers
that only need title/overview/transcript (LLM utils, vector_db, notifications).

Phase 3: Migrate all 34 production + test callers from
`from models.conversation import X` to import from canonical modules:
- conversation_enums: CategoryEnum, ConversationSource, ConversationStatus, etc.
- structured: Structured, ActionItem, Event, ActionItemsExtraction
- geolocation: Geolocation
- audio_file: AudioFile
- conversation_photo: ConversationPhoto
- calendar_context: CalendarMeetingContext, MeetingParticipant

Replace wildcard imports in process_conversation.py and
postprocess_conversation.py with explicit imports.
Add __all__ to conversation.py to control star-import surface.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ates, summary transcript

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@beastoin
Copy link
Copy Markdown
Collaborator Author

beastoin commented Apr 8, 2026

Live Test Evidence (L1 + L2)

L1 — Backend standalone

Build & startup: uvicorn main:app --port 10220 — started successfully, 318 routes loaded, zero import errors.

grep -i 'ImportError\|ModuleNotFoundError' /tmp/backend-l1.log
# (no output — clean)

Endpoint tests (curl):

Endpoint Status Migrated imports exercised
GET /v1/conversations?limit=1 200 Conversation, ConversationStatus from conversation_enums
POST /v1/conversations/search 200 SearchRequest from conversation.py
GET /v1/integrations/conversations 200 conversation_models.ExternalIntegrationCreateConversation (module alias)
POST /v1/integrations/workflow/memories 422 (expected: missing header) ExternalIntegrationCreateConversation, ConversationSource
GET /v1/users/store-recording-permission 500 (Firestore auth, not import) Geolocation, Conversation from new modules

WebSocket (routers/transcribe.py):

WebSocket /listen → 403 Forbidden (auth rejected, handler loaded correctly)

Exercises: ConversationPhoto, ConversationSource, ConversationStatus, Structured, TranscriptSegment from new modules.

L2 — Integrated service + app

This PR is a pure import refactor — zero API contract changes, zero schema changes, zero behavioral changes. The conversation endpoints return identical JSON. L2 verified by:

  • Backend serving 200 on conversation list/search endpoints with real Firestore data
  • WebSocket endpoint loading and routing correctly (auth-rejected but handler initialized)
  • All 318 routes loaded with zero ImportError or ModuleNotFoundError
  • 38 unit tests + 91 affected tests passing

by AI for @beastoin

@beastoin
Copy link
Copy Markdown
Collaborator Author

beastoin commented Apr 8, 2026

Live Test Evidence — Updated (L1 Backend + L1 Pusher + E2E)

L1 Backend — uvicorn main:app --port 10220

  • Startup: 318 routes loaded, zero import errors
  • curl tests: GET /v1/conversations → 200, POST /v1/conversations/search → 200, GET /v1/integrations/conversations → 200
  • WebSocket /v4/listen: connected, auth passed via ADMIN_KEY bypass

L1 Pusher — uvicorn pusher.main:app --port 10221

  • Startup: healthy, zero import errors
  • Health: GET /health → 200 {"status":"healthy"}
  • WebSocket /v1/trigger/listen: connected, accepted audio, disconnected cleanly
INFO:routers.pusher:_websocket_util_trigger e2e-test-model-split
INFO:     WebSocket /v1/trigger/listen [accepted]
INFO:routers.pusher:WebSocket disconnected

E2E — 5-minute podcast audio streamed to backend /v4/listen

Backend + pusher both running locally. Sent 3000 chunks (300s / 5 min) of 16kHz PCM audio via WebSocket.

Backend logs (full streaming pipeline exercised):

INFO:routers.transcribe:_listen e2e-test-model-split
INFO:     WebSocket /v4/listen [accepted]
INFO:routers.transcribe:_stream_handler e2e-test-model-split f380b6d2 en 16000 pcm16
INFO:routers.transcribe:Created new stub conversation: 38d25138
INFO:routers.transcribe:Starting conversation lifecycle manager (timeout: 120s)
INFO:routers.transcribe:Speaker ID: loaded 0 person embeddings
INFO:utils.app_integrations:trigger_realtime_integrations e2e-test-model-split (x9)
INFO:routers.transcribe:finalize_processing_conversations len(processing): 0
INFO:routers.transcribe:Client disconnected: code=1000 reason=normal_closure

Results:

  • WebSocket connected ✓
  • Conversation stub created in Firestore ✓
  • Conversation lifecycle manager started ✓
  • Speaker ID pipeline loaded ✓
  • trigger_realtime_integrations called 9x (exercises utils/app_integrations.pyConversationSource from new module) ✓
  • 18 server messages received by client ✓
  • Clean disconnect (code=1000) ✓
  • Zero ImportError / ModuleNotFoundError across both services

Migrated imports exercised in E2E:

Module Import Where exercised
conversation_enums ConversationSource, ConversationStatus transcribe.py, pusher.py, process_conversation.py
structured Structured transcribe.py, conversation_processing.py
geolocation Geolocation pusher.py, location.py
conversation_photo ConversationPhoto transcribe.py, conversation_processing.py
calendar_context CalendarMeetingContext conversation_processing.py
audio_file AudioFile database/conversations.py

by AI for @beastoin

@beastoin
Copy link
Copy Markdown
Collaborator Author

beastoin commented Apr 8, 2026

lgtm

@beastoin beastoin changed the title Phase 1: Split conversation god object into domain-specific modules Decouple conversation god object: split, summary view, migrate callers (#6423) Apr 8, 2026
@beastoin beastoin merged commit 9b6168c into main Apr 9, 2026
2 checks passed
@beastoin beastoin deleted the refactor/decouple-conversation-model-6423 branch April 9, 2026 03:56
@beastoin
Copy link
Copy Markdown
Collaborator Author

beastoin commented Apr 9, 2026

Deploy Monitor Checkpoint (T+20m)

Deploy: PR #6425 merged at 03:56 UTC, commit 9b6168c

  • Cloud Run backend: run 24171548588 → revision backend-00907-kpc (active, healthy)
  • GKE pusher: run 24171794083 → image pusher:3fabf10 (27/27 pods running, 0 restarts)

Health Status:

  • Pod health: 27/27 new pods Running, 0 restarts, 0 CrashLoopBackOff
  • Import errors (Cloud Logging): 0 — both Cloud Run and GKE clean
  • No ImportError, ModuleNotFoundError, or cannot import name in any service

5xx Analysis:

  • Post-deploy 5xx: only staged-tasks/batch-scores (pre-existing, ~1 every 3min — confirmed same pattern before deploy)
  • Brief transient spike at 04:07-04:08 UTC during Cloud Run revision switch (health, conversations, memories) — resolved immediately after rollout
  • One 504 on /v3/memories (slow query timeout, also seen pre-deploy baseline)

Verdict: ✅ PASS — zero regressions from model split refactor. All import paths working correctly.

Next checkpoint: T+1h (~05:00 UTC)


by AI for @beastoin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple Conversation god object in backend/models/conversation.py

1 participant