fix(channel): retry retrigger relay sends to avoid dropped final replies#582
fix(channel): retry retrigger relay sends to avoid dropped final replies#582brendandebeasi wants to merge 2 commits into
Conversation
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Walkthroughsend_outbound_text now returns bool indicating routed-send success. handle_agent_result fallback branches check that return value and set replied_flag = true when fallback text is sent successfully; one non-retrigger call site now awaits the bool return without using it. ChangesChannel outbound send updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agent/channel.rs`:
- Around line 3114-3131: Add an injected-failure unit/integration test that
exercises the retrigger fallback state machine around
send_outbound_text_with_retry: simulate controlled send failures and time/retry
behavior to assert both branches — (1) success after retry sets replied_flag to
true and does not enqueue the preserved replay, and (2) permanent failure leaves
replied_flag false and sends the backup "continue" notice while preserving the
background result for replay. Use dependency injection/mocking or an injectable
send function used by send_outbound_text_with_retry to force 0, partial, and
full-failure scenarios, advance any timers/async delays as needed, and assert
final state (replied_flag, preserved-result replay state, and that the correct
backup notice was sent).
- Around line 2991-3037: The send_outbound_text_with_retry function implements
sleeps and retries inside the channel event loop but mpsc::Sender::send() only
errors on a permanently closed receiver, so remove the retry loop and delay
sleeps from send_outbound_text_with_retry; instead, call send_outbound_text
once, log a clear warning including channel_id and error_context when it fails,
and return false immediately (fail fast). Keep send_outbound_text unchanged and
avoid introducing retry logic or sleeps at this channel layer—if transient
transport retries are needed, implement them in the adapter/transport layer with
tests and rationale.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2a18f698-a307-41f1-98cc-ebbfaf3ffe92
📒 Files selected for processing (1)
src/agent/channel.rs
| let delivered = self | ||
| .send_outbound_text_with_retry( | ||
| final_text, | ||
| "failed to send retrigger fallback reply", | ||
| 3, | ||
| ) | ||
| .await; | ||
| if delivered { | ||
| replied_flag.store(true, std::sync::atomic::Ordering::Relaxed); | ||
| } else { | ||
| let _ = self | ||
| .send_outbound_text_with_retry( | ||
| "Delivery issue: your background result is preserved. Send 'continue' to replay it.".to_string(), | ||
| "failed to send relay failure backup notice", | ||
| 1, | ||
| ) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Add an injected-failure test for the new retrigger fallback state machine.
These paths now couple retry timing, replied_flag mutation, preserved-result replay, and the backup "continue" notice, but the provided validation is still just cargo check. Please add a targeted test that forces controlled send failures and asserts both branches: success after retry and failure-after-all-attempts preserving replay state.
As per coding guidelines, "For changes in async/stateful paths (worker lifecycle, cancellation, retrigger, recall cache behavior), include explicit race/terminal-state reasoning in the PR summary and run targeted tests in addition to just just gate-pr."
Also applies to: 3192-3209, 3265-3270
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agent/channel.rs` around lines 3114 - 3131, Add an injected-failure
unit/integration test that exercises the retrigger fallback state machine around
send_outbound_text_with_retry: simulate controlled send failures and time/retry
behavior to assert both branches — (1) success after retry sets replied_flag to
true and does not enqueue the preserved replay, and (2) permanent failure leaves
replied_flag false and sends the backup "continue" notice while preserving the
background result for replay. Use dependency injection/mocking or an injectable
send function used by send_outbound_text_with_retry to force 0, partial, and
full-failure scenarios, advance any timers/async delays as needed, and assert
final state (replied_flag, preserved-result replay state, and that the correct
backup notice was sent).
… success Per CodeRabbit: tokio mpsc::Sender::send only fails when the receiver is closed, so the 250/1000/2000ms retry loop and the "send 'continue' to replay" backup notice are dead code (the backup goes through the same closed channel and would also fail). The real bug fix in this PR was setting replied_flag = true on successful fallback delivery — without it, the post-turn check at line ~2354 still logs "retrigger relay failed" and injects "[background work completed but relay to user failed]" into history even when the fallback succeeded, which is what produced the user-visible truncation symptom. Now: send_outbound_text returns bool, logs a clear warning on failure, fallback paths set replied_flag on success. Net diff: +13/-77.
Summary
Implements a first-pass mitigation for #581 by hardening retrigger/fallback delivery in
src/agent/channel.rs.What changed
send_outbound_text_with_retry).replied_flag=trueso relay state is preserved correctly.continueto replay preserved result.Why
Logs showed successful worker completion followed by:
retrigger produced text without reply tool, sending as fallbackretrigger relay failed, preserving result in history for next turnThat leads to user-visible truncation (partial opener, no final rundown). This PR adds retries and clearer user-facing fallback behavior.
Validation
cargo check --all-targetspasses.Follow-ups (optional)