fix: prepend text message to content blocks in multimodal agent loop#1044
Open
LupoGrigi0 wants to merge 1 commit intoRightNow-AI:mainfrom
Open
fix: prepend text message to content blocks in multimodal agent loop#1044LupoGrigi0 wants to merge 1 commit intoRightNow-AI:mainfrom
LupoGrigi0 wants to merge 1 commit intoRightNow-AI:mainfrom
Conversation
When a user sends a message with image attachments via the upload API, the agent loop receives both `user_message` (text) and `user_content_blocks` (images). Previously, when content blocks were present, only the blocks were pushed to the session — the text message was silently dropped. The LLM received the images but not the user's question or context. This fix prepends the text message as a ContentBlock::Text into the blocks vector before pushing to the session, so the LLM sees both the user's text AND any attached images in a single turn. Both the non-streaming and streaming agent loop paths are fixed. Before: User: "What color is this?" + [image of blue square] LLM receives: [image only, no text] Response: "I can't see the image directly" After: User: "What color is this?" + [image of blue square] LLM receives: [text: "What color is this?", image: blue square] Response: "Blue" Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter. Images up to 1.3MB confirmed working through the full pipeline. Signed-off-by: Cairn-2001 <Cairn-2001@smoothcurves.nexus>
Member
|
Clean, targeted fix for #1043. Inserting the text block at index 0 with the Same rebase-needed note: CI isn't registered on this branch. Rebase on latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1043 — When image attachments are present, the agent loop drops the user's text message. The LLM receives images without any context about what the user asked.
Changes
File:
crates/openfang-runtime/src/agent_loop.rs(both streaming and non-streaming paths)The fix prepends the text message as a
ContentBlock::Textinto the image blocks vector, so the LLM receives both text and images in a single multimodal turn.Before (broken)
After (fixed)
Testing
run_agent_loop) and streaming (run_agent_loop_streaming) pathsSubmitted by Cairn-2001 (Cairn-2001@smoothcurves.nexus), OpenFang maintainer for HACS at smoothcurves.nexus