Background (#154)
The product originally executed these flows entirely in-process. When we needed horizontal scaling in production, we introduced BullMQ workers to preserve the existing behavior without doing a full refactor first.
That was the right short-term tradeoff, but it left several queue workers owning forked copies of orchestration that also exists in routes/services for the in-process path.
Problem
Today, some of the worker files are not thin transport adapters. They also contain workflow/business orchestration that overlaps with the original in-process implementations.
This means:
- fixes often need to be applied in more than one place
- queue mode and in-process mode can drift behaviorally
- testing and reasoning about parity is harder than it should be
- worker-only changes can silently become the new source of truth for business rules
The duplication is not equally severe across all four worker files. deep-research.worker.ts is the main offender, chat.worker.ts is next, and file-process.worker.ts / paper-generation.worker.ts are comparatively smaller because they already delegate more work to shared services.
Evidence
1. Deep research has the largest duplicated surface area
The queue worker is effectively a fork of the in-process deep-research executor, with the main architectural change being how continuation is scheduled.
src/routes/deep-research/start.ts:998-1092 and src/services/queue/workers/deep-research.worker.ts:89-177 both define conversation-state persistence helpers, objective-trace syncing, and activity clearing.
src/routes/deep-research/start.ts:1147-1352 and src/services/queue/workers/deep-research.worker.ts:430-540 both promote clarification tasks / planning output into the persisted plan.
src/routes/deep-research/start.ts:1354-1654 and src/services/queue/workers/deep-research.worker.ts:551-745 both execute literature + analysis tasks, serialize state writes, and emit state updates while tasks run.
src/routes/deep-research/start.ts:1657-2037 and src/services/queue/workers/deep-research.worker.ts:751-1161 both run hypothesis, reflection/discovery, next-step planning, continue-research decisioning, reply generation, and continuation setup.
src/routes/deep-research/start.ts:2058-2087 and src/services/queue/workers/deep-research.worker.ts:1181-1310 both own end-of-run cleanup/failure behavior, but the worker also adds credits completion/refund logic, which makes the worker path an independent business-logic owner.
The important point is that the worker is not just adapting transport concerns. It is re-implementing most of the research iteration logic.
The only material architectural difference is:
- in-process mode loops in memory inside
runDeepResearch(...)
- worker mode executes one iteration and enqueues the next job
That scheduling difference is valid, but it should sit on top of a shared iteration executor rather than a duplicated workflow body.
2. Chat still duplicates the active agent-loop flow, and also carries a stale legacy fork
src/routes/chat.ts:546-718 and src/services/queue/workers/chat.worker.ts:456-568 both initialize state, call runChatAgent(...), persist agentProgress, handle truncation, save the final reply, and record response time.
src/services/queue/workers/chat.worker.ts:189-449 still contains a full legacy planning -> literature -> hypothesis -> reflection -> reply pipeline that is no longer mirrored by the current in-process route.
- The file itself documents the duplication explicitly at
src/services/queue/workers/chat.worker.ts:1-6.
So there are really two issues here:
- the current agent-loop path exists in both the route and the worker
- the worker still owns a second, older pipeline that can diverge further from the production path
3. File processing is partly refactored already, but the async lifecycle is still split
This area is in better shape than chat/deep research because the core processing logic is already shared:
src/services/files/index.ts:312-389 contains processFile(...), and both paths call it
The remaining duplication is around the lifecycle that wraps that core logic:
src/services/files/index.ts:277-305 branches between enqueueing and in-process execution
src/services/queue/workers/file-process.worker.ts:25-105 re-implements the async execution lifecycle around the same core function
What is still duplicated / fragmented here:
- reconstructing status when Redis TTL has expired
- publishing
file:ready / file:error
- error-to-status transitions
This is a smaller refactor than the others, but it is still a maintenance hotspot.
4. Paper generation already shares the heavy lifting, but lifecycle ownership is fragmented
This area is also less severe than deep research because the core generation pipeline is already centralized:
src/services/paper/generatePaper.ts:74 is the shared paper-generation service entrypoint
- both
paperGenerationHandler(...) and the worker call generatePaperFromConversation(...)
However the surrounding lifecycle is still split across multiple places:
src/routes/deep-research/paper.ts:71-171 owns the sync execution path
src/routes/deep-research/paper.ts:416-552 owns async job creation + initial paper record setup
src/services/queue/workers/paper-generation.worker.ts:42-144 owns processing-state transitions, progress updates, completion/failure transitions, and queue notifications
generatePaperFromConversation(...) already exposes existingPaperId and onProgress, which is a good foundation. The remaining issue is that the paper job lifecycle is still fragmented across the route and worker rather than expressed once as a shared executor.
Why this matters
- Behavior parity is fragile.
USE_JOB_QUEUE=true and USE_JOB_QUEUE=false are not just different transport modes; in some cases they run different orchestration code.
- Refactors are riskier because there is no single source of truth for the workflow.
- Tests become less meaningful if only one execution path is covered.
- Worker files become harder to reason about because queue concerns and business workflow concerns are mixed together.
- The deep-research worker is now carrying production-only logic such as completion/refund behavior, which increases the chance that future changes land only in the worker path.
Proposed direction
The goal should not be to remove every line of duplication mechanically. The goal should be to make workers thin and make workflow logic live in shared executors.
Refactor principles
- keep HTTP-specific concerns in routes
- keep BullMQ-specific concerns in workers
- move workflow/business orchestration into shared services/executors
- keep notification/progress emission pluggable via callbacks or a small runtime interface
Concrete direction by area
Deep research
Extract a shared iteration executor that contains the duplicated core currently spread across:
- planning / clarification task promotion
- task execution
- state persistence + objective trace management
- hypothesis / reflection / discovery
- next-step planning
- continue decisioning
- reply generation
Then let each mode provide only the scheduling strategy:
- in-process mode: call shared executor in a loop
- queue mode: call shared executor once, then enqueue the next job if needed
Chat
Extract a single shared chat executor for the agent-loop path and call it from both:
src/routes/chat.ts
src/services/queue/workers/chat.worker.ts
Also decide explicitly what to do with the worker-only legacy path:
- remove it if it is no longer supported
- or isolate it behind a clearly named legacy executor so it is not mixed into the primary worker implementation
File processing
Keep processFile(...) as the core, but extract the remaining async lifecycle into a shared helper so that status reconstruction, notifications, and error transitions are not worker-only behavior.
Paper generation
Introduce a shared paper-job executor around generatePaperFromConversation(...) so the route/worker split is mostly:
- route: auth + validation + enqueue / sync response handling
- worker: BullMQ bootstrap + event hooks
- shared executor: status transitions, progress mapping, completion/failure handling
Priority
Recommended refactor order:
- Deep research
- Chat
- Paper generation
- File processing
The expected payoff is highest for deep research because it currently duplicates the most workflow logic and has the highest drift risk.
Proposed fixes
- each feature has one shared workflow executor for the core business logic
- worker files are reduced to job deserialization, queue-specific progress/notification wiring, and result return values
- route files are reduced to auth/validation/request shaping/response shaping
- deep research iteration logic exists once, with separate scheduling strategies for in-process vs queue mode
- chat has one shared primary execution path for the agent-loop flow
Background (#154)
The product originally executed these flows entirely in-process. When we needed horizontal scaling in production, we introduced BullMQ workers to preserve the existing behavior without doing a full refactor first.
That was the right short-term tradeoff, but it left several queue workers owning forked copies of orchestration that also exists in routes/services for the in-process path.
Problem
Today, some of the worker files are not thin transport adapters. They also contain workflow/business orchestration that overlaps with the original in-process implementations.
This means:
The duplication is not equally severe across all four worker files.
deep-research.worker.tsis the main offender,chat.worker.tsis next, andfile-process.worker.ts/paper-generation.worker.tsare comparatively smaller because they already delegate more work to shared services.Evidence
1. Deep research has the largest duplicated surface area
The queue worker is effectively a fork of the in-process deep-research executor, with the main architectural change being how continuation is scheduled.
src/routes/deep-research/start.ts:998-1092andsrc/services/queue/workers/deep-research.worker.ts:89-177both define conversation-state persistence helpers, objective-trace syncing, and activity clearing.src/routes/deep-research/start.ts:1147-1352andsrc/services/queue/workers/deep-research.worker.ts:430-540both promote clarification tasks / planning output into the persisted plan.src/routes/deep-research/start.ts:1354-1654andsrc/services/queue/workers/deep-research.worker.ts:551-745both execute literature + analysis tasks, serialize state writes, and emit state updates while tasks run.src/routes/deep-research/start.ts:1657-2037andsrc/services/queue/workers/deep-research.worker.ts:751-1161both run hypothesis, reflection/discovery, next-step planning, continue-research decisioning, reply generation, and continuation setup.src/routes/deep-research/start.ts:2058-2087andsrc/services/queue/workers/deep-research.worker.ts:1181-1310both own end-of-run cleanup/failure behavior, but the worker also adds credits completion/refund logic, which makes the worker path an independent business-logic owner.The important point is that the worker is not just adapting transport concerns. It is re-implementing most of the research iteration logic.
The only material architectural difference is:
runDeepResearch(...)That scheduling difference is valid, but it should sit on top of a shared iteration executor rather than a duplicated workflow body.
2. Chat still duplicates the active agent-loop flow, and also carries a stale legacy fork
src/routes/chat.ts:546-718andsrc/services/queue/workers/chat.worker.ts:456-568both initialize state, callrunChatAgent(...), persistagentProgress, handle truncation, save the final reply, and record response time.src/services/queue/workers/chat.worker.ts:189-449still contains a full legacy planning -> literature -> hypothesis -> reflection -> reply pipeline that is no longer mirrored by the current in-process route.src/services/queue/workers/chat.worker.ts:1-6.So there are really two issues here:
3. File processing is partly refactored already, but the async lifecycle is still split
This area is in better shape than chat/deep research because the core processing logic is already shared:
src/services/files/index.ts:312-389containsprocessFile(...), and both paths call itThe remaining duplication is around the lifecycle that wraps that core logic:
src/services/files/index.ts:277-305branches between enqueueing and in-process executionsrc/services/queue/workers/file-process.worker.ts:25-105re-implements the async execution lifecycle around the same core functionWhat is still duplicated / fragmented here:
file:ready/file:errorThis is a smaller refactor than the others, but it is still a maintenance hotspot.
4. Paper generation already shares the heavy lifting, but lifecycle ownership is fragmented
This area is also less severe than deep research because the core generation pipeline is already centralized:
src/services/paper/generatePaper.ts:74is the shared paper-generation service entrypointpaperGenerationHandler(...)and the worker callgeneratePaperFromConversation(...)However the surrounding lifecycle is still split across multiple places:
src/routes/deep-research/paper.ts:71-171owns the sync execution pathsrc/routes/deep-research/paper.ts:416-552owns async job creation + initial paper record setupsrc/services/queue/workers/paper-generation.worker.ts:42-144owns processing-state transitions, progress updates, completion/failure transitions, and queue notificationsgeneratePaperFromConversation(...)already exposesexistingPaperIdandonProgress, which is a good foundation. The remaining issue is that the paper job lifecycle is still fragmented across the route and worker rather than expressed once as a shared executor.Why this matters
USE_JOB_QUEUE=trueandUSE_JOB_QUEUE=falseare not just different transport modes; in some cases they run different orchestration code.Proposed direction
The goal should not be to remove every line of duplication mechanically. The goal should be to make workers thin and make workflow logic live in shared executors.
Refactor principles
Concrete direction by area
Deep research
Extract a shared iteration executor that contains the duplicated core currently spread across:
Then let each mode provide only the scheduling strategy:
Chat
Extract a single shared chat executor for the agent-loop path and call it from both:
src/routes/chat.tssrc/services/queue/workers/chat.worker.tsAlso decide explicitly what to do with the worker-only legacy path:
File processing
Keep
processFile(...)as the core, but extract the remaining async lifecycle into a shared helper so that status reconstruction, notifications, and error transitions are not worker-only behavior.Paper generation
Introduce a shared paper-job executor around
generatePaperFromConversation(...)so the route/worker split is mostly:Priority
Recommended refactor order:
The expected payoff is highest for deep research because it currently duplicates the most workflow logic and has the highest drift risk.
Proposed fixes