fix(channel): retry retrigger relay sends to avoid dropped final replies by brendandebeasi · Pull Request #582 · spacedriveapp/spacebot

brendandebeasi · 2026-05-02T16:03:42Z

Summary

Implements a first-pass mitigation for #581 by hardening retrigger/fallback delivery in src/agent/channel.rs.

What changed

Added bounded retry/backoff for outbound text sends (send_outbound_text_with_retry).
Applied retries to retrigger plaintext fallback path (where dropped final replies were observed).
When retrigger fallback send succeeds, marks replied_flag=true so relay state is preserved correctly.
If all retries fail on retrigger relay, sends a short backup notice instructing user to send continue to replay preserved result.
Applied smaller retry to non-retrigger plaintext fallback path.

Why

Logs showed successful worker completion followed by:

retrigger produced text without reply tool, sending as fallback
retrigger relay failed, preserving result in history for next turn

That leads to user-visible truncation (partial opener, no final rundown). This PR adds retries and clearer user-facing fallback behavior.

Validation

cargo check --all-targets passes.

Follow-ups (optional)

Add adapter-level tests with injected send failure to assert retry + backup notice behavior.
Include richer transport error metadata from messaging adapters for easier debugging.

coderabbitai · 2026-05-02T16:03:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 03684f88-5ad0-48c3-945b-3ca4adb0387f

📥 Commits

Reviewing files that changed from the base of the PR and between d7b5c0d and 1010830.

📒 Files selected for processing (1)

src/agent/channel.rs

Walkthrough

send_outbound_text now returns bool indicating routed-send success. handle_agent_result fallback branches check that return value and set replied_flag = true when fallback text is sent successfully; one non-retrigger call site now awaits the bool return without using it.

Changes

Channel outbound send updates

Layer / File(s)	Summary
Send return type `src/agent/channel.rs`	`send_outbound_text` changed to return `bool` (true on routed send success, false on routed channel closed); metrics and logging retained.
Retrigger-skipped fallback `src/agent/channel.rs`	Retrigger-skipped-but-produced-text path now checks outbound send result and sets `replied_flag = true` when send returns true.
Retrigger produced-text (no reply tool) `src/agent/channel.rs`	Retrigger produced-text without reply-tool path now checks outbound send result and sets `replied_flag = true` when send returns true.
Non-retrigger fallback call site `src/agent/channel.rs`	Non-retrigger fallback send updated to await `send_outbound_text`'s `bool` return value; the returned bool is not used to set `replied_flag` in this path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Telegram: retrigger relay failure drops final response after worker completion #581: Matches the change to Channel::send_outbound_text return behavior and retrigger-fallback reply handling.

Possibly related PRs

spacedriveapp/spacebot#231: Overlaps modifications to handle_agent_result and outbound delivery logic in src/agent/channel.rs.

Suggested reviewers

jamiepine

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding retry logic to prevent dropped final replies in retrigger relay sends, which directly aligns with the changeset's primary objective.
Description check	✅ Passed	The description provides relevant context about the changes made to handle_agent_result and send_outbound_text in channel.rs, explaining the motivation to avoid dropped final replies and improve fallback delivery.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/channel.rs`:
- Around line 3114-3131: Add an injected-failure unit/integration test that
exercises the retrigger fallback state machine around
send_outbound_text_with_retry: simulate controlled send failures and time/retry
behavior to assert both branches — (1) success after retry sets replied_flag to
true and does not enqueue the preserved replay, and (2) permanent failure leaves
replied_flag false and sends the backup "continue" notice while preserving the
background result for replay. Use dependency injection/mocking or an injectable
send function used by send_outbound_text_with_retry to force 0, partial, and
full-failure scenarios, advance any timers/async delays as needed, and assert
final state (replied_flag, preserved-result replay state, and that the correct
backup notice was sent).
- Around line 2991-3037: The send_outbound_text_with_retry function implements
sleeps and retries inside the channel event loop but mpsc::Sender::send() only
errors on a permanently closed receiver, so remove the retry loop and delay
sleeps from send_outbound_text_with_retry; instead, call send_outbound_text
once, log a clear warning including channel_id and error_context when it fails,
and return false immediately (fail fast). Keep send_outbound_text unchanged and
avoid introducing retry logic or sleeps at this channel layer—if transient
transport retries are needed, implement them in the adapter/transport layer with
tests and rationale.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a18f698-a307-41f1-98cc-ebbfaf3ffe92

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb07b8 and d7b5c0d.

📒 Files selected for processing (1)

src/agent/channel.rs

coderabbitai · 2026-05-02T16:07:55Z

+                                let delivered = self
+                                    .send_outbound_text_with_retry(
+                                        final_text,
+                                        "failed to send retrigger fallback reply",
+                                        3,
+                                    )
+                                    .await;
+                                if delivered {
+                                    replied_flag.store(true, std::sync::atomic::Ordering::Relaxed);
+                                } else {
+                                    let _ = self
+                                        .send_outbound_text_with_retry(
+                                            "Delivery issue: your background result is preserved. Send 'continue' to replay it.".to_string(),
+                                            "failed to send relay failure backup notice",
+                                            1,
+                                        )
+                                        .await;
+                                }


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add an injected-failure test for the new retrigger fallback state machine.

These paths now couple retry timing, replied_flag mutation, preserved-result replay, and the backup "continue" notice, but the provided validation is still just cargo check. Please add a targeted test that forces controlled send failures and asserts both branches: success after retry and failure-after-all-attempts preserving replay state.

As per coding guidelines, "For changes in async/stateful paths (worker lifecycle, cancellation, retrigger, recall cache behavior), include explicit race/terminal-state reasoning in the PR summary and run targeted tests in addition to just just gate-pr."

Also applies to: 3192-3209, 3265-3270

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/channel.rs` around lines 3114 - 3131, Add an injected-failure unit/integration test that exercises the retrigger fallback state machine around send_outbound_text_with_retry: simulate controlled send failures and time/retry behavior to assert both branches — (1) success after retry sets replied_flag to true and does not enqueue the preserved replay, and (2) permanent failure leaves replied_flag false and sends the backup "continue" notice while preserving the background result for replay. Use dependency injection/mocking or an injectable send function used by send_outbound_text_with_retry to force 0, partial, and full-failure scenarios, advance any timers/async delays as needed, and assert final state (replied_flag, preserved-result replay state, and that the correct backup notice was sent).

… success Per CodeRabbit: tokio mpsc::Sender::send only fails when the receiver is closed, so the 250/1000/2000ms retry loop and the "send 'continue' to replay" backup notice are dead code (the backup goes through the same closed channel and would also fail). The real bug fix in this PR was setting replied_flag = true on successful fallback delivery — without it, the post-turn check at line ~2354 still logs "retrigger relay failed" and injects "[background work completed but relay to user failed]" into history even when the fallback succeeded, which is what produced the user-visible truncation symptom. Now: send_outbound_text returns bool, logs a clear warning on failure, fallback paths set replied_flag on success. Net diff: +13/-77.

fix(channel): retry retrigger relay sends and preserve delivery state

d7b5c0d

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

brendandebeasi mentioned this pull request May 2, 2026

Telegram: retrigger relay failure drops final response after worker completion #581

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(channel): retry retrigger relay sends to avoid dropped final replies#582

fix(channel): retry retrigger relay sends to avoid dropped final replies#582
brendandebeasi wants to merge 2 commits into
spacedriveapp:mainfrom
brendandebeasi:fix/telegram-retrigger-relay-retry

brendandebeasi commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendandebeasi commented May 2, 2026

Summary

What changed

Why

Validation

Follow-ups (optional)

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 2, 2026 •

edited

Loading