Visual Mode Display Fix - Complete Solution

Problem Summary

Video recording, live detection, and video playback were being blocked by TTS text responses. The image generation path works correctly, but video paths don't.

Root Cause Analysis

Why Image Generation Works

Tool generates image → setLatestGenImg(path) → Returns success
LLM generates TTS response
TTS plays (display shows text)
AFTER TTS completes, ChatFlow.getPlayEndPromise() triggers
Checks getLatestGenImg() → displays image → switches to "image" flow
Image stays on screen, no TTS can interrupt

Why Video/Detection Was Broken

Tool starts process → Starts display interval immediately → Returns success
LLM generates TTS response
TTS display updates OVERRIDE video frames ← CONFLICT!
Video frames keep trying to update but get blocked by TTS

Solution Architecture

New Pattern: Defer Display Until After TTS

Video/detection tools should follow the same pattern as image generation:

Tool prepares operation (starts process if needed)
Tool sets "pending" marker (like setPendingDetection("marker"))
Tool returns success WITHOUT starting display
LLM generates (brief!) TTS response
TTS plays
AFTER TTS, ChatFlow checks pending markers
ChatFlow switches to visual flow and starts display updates
Visual display runs uninterrupted until stopped

Implementation

New Files Created

1. `src/utils/image.ts` (Updated)

Added pending marker functions:

setPendingDetection(marker) / getPendingDetection()
setPendingRecording(marker) / getPendingRecording()
getVideoPlaybackMarker() (modified to be retrieval-only)
hasVisualPending() - checks if any visual mode is pending

2. `src/utils/visualCoordinator.ts` (NEW)

Central coordinator for all visual display intervals:

startDetectionDisplay() - Starts detection frame updates
startRecordingDisplay() - Starts recording preview updates
startPlaybackDisplay() - Starts video playback updates
stopDetectionDisplay() / stopRecordingDisplay() / stopPlaybackDisplay()
stopAllDisplays() - Cleanup helper

3. `src/core/ChatFlow.ts` (Updated)

Checks hasVisualPending() in TTS callbacks to prevent text display
After TTS completes, checks for pending visual markers
New flows: "detection", "recording", "videoPlayback"
Each flow calls coordinator to start display, sets up button handlers

How to Update Visual Tools

Tools need to follow this pattern:

// OLD PATTERN (DON'T DO THIS)
export const startDetectionTool = {
  func: async (params) => {
    startDetectionProcess();
    
    // ❌ DON'T start interval immediately
    detectionInterval = setInterval(() => {
      display({ image: FRAME_PATH });
    }, 100);
    
    return "[success]Detection started";
  }
};

// NEW PATTERN (DO THIS)
import { setPendingDetection } from "../../utils/image";

export const startDetectionTool = {
  func: async (params) => {
    // Start the process (it will write frames to temp files)
    startDetectionProcess();
    
    // ✅ Set pending marker - DON'T start display
    setPendingDetection("detection_active");
    
    // Return success - LLM will generate brief TTS
    // After TTS, ChatFlow will check marker and start display
    return "[success]Starting detection";
  }
};

Required Changes Per Tool

Live Detection Tool (`src/config/custom-tools/live-detection.ts`)

Changes needed:

Import setPendingDetection from ../../utils/image
In startLiveDetection function:
- Keep process spawning logic
- REMOVE the detectionUpdateInterval = setInterval(...) block
- REMOVE the setLiveDetectionActive(true) call (coordinator handles this)
- ADD setPendingDetection("active") before returning
In stopLiveDetection function:
- Import and call stopDetectionDisplay() from visualCoordinator
- Keep process cleanup logic

Video Recording Tool (`src/config/custom-tools/video.ts`)

For recordVideoForDuration:

Import setPendingRecording from ../../utils/image
Keep recording process/command execution
REMOVE the previewUpdateInterval = setInterval(...) block
REMOVE the setVideoRecordingActive(true) call
ADD setPendingRecording("recording") before returning

For startVideoRecording:

Same pattern as above

For stopVideoRecording:

Import and call stopRecordingDisplay() from visualCoordinator

For playVideo:

Import setVideoPlaybackMarker (already exists)
Keep video player process spawn logic
REMOVE the playbackUpdateInterval = setInterval(...) block
Call setVideoPlaybackMarker(VIDEO_FRAME_PATH)
Return success

For stopVideo:

Import and call stopPlaybackDisplay() from visualCoordinator

How It Works (Flow Diagram)

User says: "Start detecting person"
    ↓
[startLiveDetection tool called]
    ↓
1. Spawn Python detection process (writes frames to /tmp/)
2. setPendingDetection("active")
3. Return "[success]Starting detection"
    ↓
[LLM receives success, generates response]
    ↓
4. LLM: "I'll start detecting people for you"
5. TTS plays (brief, text might show on display)
    ↓
[TTS completes, getPlayEndPromise() triggers]
    ↓
6. ChatFlow checks getPendingDetection() → "active"
7. ChatFlow.setCurrentFlow("detection")
    ↓
[Detection flow activated]
    ↓
8. startDetectionDisplay() called
9. setInterval starts updating display with frames
10. Frames display continuously, no TTS can interrupt!
    ↓
User presses button OR says "stop"
    ↓
11. stopDetectionDisplay() called
12. Clear interval, stop process
13. Return to listening mode

Testing

Test Cases

1. Live Detection:

# Say: "Start detecting person"
# Expected: Brief TTS, then detection frames display continuously
# Say anything else while detecting
# Expected: Detection continues, no text overlay

2. Video Recording:

# Say: "Record video for 10 seconds"
# Expected: Brief TTS, then preview frames display
# Expected: Recording completes, shows success message briefly

3. Video Playback:

# Say: "Play the video"
# Expected: Brief TTS, then video frames play
# Expected: Video plays smoothly without text overlay

4. Image Generation (should still work):

# Say: "Draw a cat"
# Expected: TTS plays, then image displays (same as before)

Benefits of This Approach

Consistent Pattern: All visual modes follow the same architecture as image generation
Clean Separation: Tools handle process management, ChatFlow handles display flow
No Race Conditions: Display only starts after TTS completes
Centralized Control: visualCoordinator manages all display intervals in one place
Easy to Debug: Clear flow from tool → pending marker → ChatFlow → coordinator
Button Handling: Each visual flow properly handles button presses to exit

Files Modified Summary

✅ src/utils/image.ts - Added pending marker functions
✅ src/utils/visualCoordinator.ts - NEW coordinator module
✅ src/core/ChatFlow.ts - Integrated pending markers and visual flows
⚠️ src/config/custom-tools/live-detection.ts - NEEDS UPDATE (remove intervals, add markers)
⚠️ src/config/custom-tools/video.ts - NEEDS UPDATE (remove intervals, add markers)

Next Steps

Update live-detection.ts following the pattern above
Update video.ts following the pattern above
Test each visual mode
Verify TTS doesn't block video display
Verify button press exits visual modes cleanly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual Mode Display Fix - Complete Solution

Problem Summary

Root Cause Analysis

Why Image Generation Works

Why Video/Detection Was Broken

Solution Architecture

New Pattern: Defer Display Until After TTS

Implementation

New Files Created

1. `src/utils/image.ts` (Updated)

2. `src/utils/visualCoordinator.ts` (NEW)

3. `src/core/ChatFlow.ts` (Updated)

How to Update Visual Tools

Required Changes Per Tool

Live Detection Tool (`src/config/custom-tools/live-detection.ts`)

Video Recording Tool (`src/config/custom-tools/video.ts`)

How It Works (Flow Diagram)

Testing

Test Cases

Benefits of This Approach

Files Modified Summary

Next Steps

FilesExpand file tree

VISUAL_MODE_FIX.md

Latest commit

History

VISUAL_MODE_FIX.md

File metadata and controls

Visual Mode Display Fix - Complete Solution

Problem Summary

Root Cause Analysis

Why Image Generation Works

Why Video/Detection Was Broken

Solution Architecture

New Pattern: Defer Display Until After TTS

Implementation

New Files Created

1. src/utils/image.ts (Updated)

2. src/utils/visualCoordinator.ts (NEW)

3. src/core/ChatFlow.ts (Updated)

How to Update Visual Tools

Required Changes Per Tool

Live Detection Tool (src/config/custom-tools/live-detection.ts)

Video Recording Tool (src/config/custom-tools/video.ts)

How It Works (Flow Diagram)

Testing

Test Cases

Benefits of This Approach

Files Modified Summary

Next Steps

1. `src/utils/image.ts` (Updated)

2. `src/utils/visualCoordinator.ts` (NEW)

3. `src/core/ChatFlow.ts` (Updated)

Live Detection Tool (`src/config/custom-tools/live-detection.ts`)

Video Recording Tool (`src/config/custom-tools/video.ts`)