fix: prepend text message to content blocks in multimodal agent loop by LupoGrigi0 · Pull Request #1044 · RightNow-AI/openfang

LupoGrigi0 · 2026-04-12T11:52:15Z

Summary

Fixes #1043 — When image attachments are present, the agent loop drops the user's text message. The LLM receives images without any context about what the user asked.

Changes

File: crates/openfang-runtime/src/agent_loop.rs (both streaming and non-streaming paths)

The fix prepends the text message as a ContentBlock::Text into the image blocks vector, so the LLM receives both text and images in a single multimodal turn.

Before (broken)

if let Some(blocks) = user_content_blocks {
    // blocks = images ONLY — text message silently dropped
    session.messages.push(Message::user_with_blocks(blocks));
} else {
    session.messages.push(Message::user(user_message));
}

After (fixed)

if let Some(mut blocks) = user_content_blocks {
    if !user_message.is_empty() {
        blocks.insert(0, ContentBlock::Text {
            text: user_message.to_string(),
            provider_metadata: None,
        });
    }
    session.messages.push(Message::user_with_blocks(blocks));
} else {
    session.messages.push(Message::user(user_message));
}

Testing

Test	Before	After
100x100 blue square + "What color?"	"I can't see the image"	"Blue"
388KB screenshot + "Describe this"	Hallucinated response	Accurate description
1.3MB bird illustration	81K tokens consumed, hallucinated inability	"Stippled illustration of a bird"

Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter
Images up to 1.3MB (1.8MB base64) confirmed working
Direct OpenRouter API calls verified that both models support vision — the issue was purely in the agent loop's message construction
Fix applied to both non-streaming (run_agent_loop) and streaming (run_agent_loop_streaming) paths
Running in production across 3 OpenFang instances for the HACS coordination system

Submitted by Cairn-2001 (Cairn-2001@smoothcurves.nexus), OpenFang maintainer for HACS at smoothcurves.nexus

When a user sends a message with image attachments via the upload API, the agent loop receives both `user_message` (text) and `user_content_blocks` (images). Previously, when content blocks were present, only the blocks were pushed to the session — the text message was silently dropped. The LLM received the images but not the user's question or context. This fix prepends the text message as a ContentBlock::Text into the blocks vector before pushing to the session, so the LLM sees both the user's text AND any attached images in a single turn. Both the non-streaming and streaming agent loop paths are fixed. Before: User: "What color is this?" + [image of blue square] LLM receives: [image only, no text] Response: "I can't see the image directly" After: User: "What color is this?" + [image of blue square] LLM receives: [text: "What color is this?", image: blue square] Response: "Blue" Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter. Images up to 1.3MB confirmed working through the full pipeline. Signed-off-by: Cairn-2001 <Cairn-2001@smoothcurves.nexus>

jaberjaber23 · 2026-04-17T18:22:47Z

Clean, targeted fix for #1043. Inserting the text block at index 0 with the !user_message.is_empty() guard is the right call (avoids a stray empty Text block when the channel bridge sends images without caption).

Same rebase-needed note: CI isn't registered on this branch. Rebase on latest main to trigger checks and we'll merge once green.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prepend text message to content blocks in multimodal agent loop#1044

fix: prepend text message to content blocks in multimodal agent loop#1044
LupoGrigi0 wants to merge 1 commit intoRightNow-AI:mainfrom
LupoGrigi0:fix/multimodal-message-text-dropped

LupoGrigi0 commented Apr 12, 2026

Uh oh!

jaberjaber23 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LupoGrigi0 commented Apr 12, 2026

Summary

Changes

Before (broken)

After (fixed)

Testing

Uh oh!

jaberjaber23 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants