You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(backend): stream DeerFlowClient AI text as token deltas (#1969)
DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.
Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.
chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.
Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.
New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
delta events with correct content/id/usage, values-snapshot does not
duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
mode is not duplicated by the values-snapshot synthesis path.
The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).
Fixes#1969
Copy file name to clipboardExpand all lines: backend/CLAUDE.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -395,11 +395,12 @@ Both can be modified at runtime via Gateway API endpoints or `DeerFlowClient` me
395
395
**Architecture**: Imports the same `deerflow` modules that LangGraph Server and Gateway API use. Shares the same config files and data directories. No FastAPI dependency.
-`chat(message, thread_id)` — synchronous, accumulates streaming deltas per message-id and returns the final AI text
399
+
-`stream(message, thread_id)` — subscribes to LangGraph `stream_mode=["values", "messages", "custom"]` and yields `StreamEvent`:
400
+
-`"values"` — full state snapshot (title, messages, artifacts); AI text already delivered via `messages` mode is **not** re-synthesized here to avoid duplicate deliveries
401
+
-`"messages-tuple"` — per-chunk update: for AI text this is a **delta** (concat per `id` to rebuild the full message); tool calls and tool results are emitted once each
402
+
-`"custom"` — forwarded from `StreamWriter`
403
+
-`"end"` — stream finished (carries cumulative `usage` counted once per message id)
403
404
- Agent created lazily via `create_agent()` + `_build_middlewares()`, same as `make_lead_agent`
404
405
- Supports `checkpointer` parameter for state persistence across turns
405
406
-`reset_agent()` forces agent recreation (e.g. after memory or skill changes)
0 commit comments