Skip to content

fix: don't retry on QueueingError (Veo/Lyria async queueing is not a transient error)#314

Open
chiuweilun1107 wants to merge 1 commit intoHanaokaYuzu:masterfrom
chiuweilun1107:fix/queueing-no-retry
Open

fix: don't retry on QueueingError (Veo/Lyria async queueing is not a transient error)#314
chiuweilun1107 wants to merge 1 commit intoHanaokaYuzu:masterfrom
chiuweilun1107:fix/queueing-no-retry

Conversation

@chiuweilun1107
Copy link
Copy Markdown

Problem

When generate_content encounters Stream suspended (queueing=True), the server has accepted the request and is processing it asynchronously (e.g. Veo video rendering, Lyria music generation). The current code raises APIError at this point, which the @running(retry=5) decorator treats as a transient error and retries.

Each retry sends a new request, creating a new server-side job (new conversation) and burning an additional daily quota slot.

Empirical evidence

Tested with Veo 3 video generation prompts:

  • Within 45 seconds, the decorator fired 4 retries, creating 4 independent Veo conversations visible in the Gemini web UI
  • Each consumed a separate daily quota slot
  • The backoff delays (5s/10s/15s/20s/25s) don't help because Stream suspended fires within ~12s of each attempt — faster than the first backoff

Fix

Introduce QueueingError(GeminiError) — a new exception that inherits from GeminiError instead of APIError.

When is_queueing=True at the point of stream suspension:

  • Raise QueueingError instead of APIError
  • The @running decorator only retries APIError, so QueueingError bubbles up immediately with zero retries
  • Callers can catch QueueingError and switch to a poll-based flow (list_chats() + read_chat()) to retrieve the result once the server finishes rendering

When is_queueing=False (transient connection issues, cookie drift, etc.):

  • Still raises APIError as before — existing retry behaviour is preserved

Changes

File Change
exceptions.py Add QueueingError(GeminiError)
client.py Import QueueingError; raise it instead of APIError when is_queueing=True in the stream-suspended branch

No changes to decorators.py — the fix works purely through the exception hierarchy.

Backward compatibility

  • Code that catches APIError will not catch QueueingError — this is intentional (the retry was harmful, not helpful)
  • Code that catches GeminiError will catch QueueingError (since it's a subclass)
  • Code that catches Exception is unaffected

Usage example (after this fix)

from gemini_webapi.exceptions import QueueingError

try:
    response = await client.generate_content("Generate a video of a cat")
except QueueingError:
    # Server accepted the request and is rendering asynchronously.
    # Poll list_chats() for the new conversation, then read_chat(cid)
    # to retrieve the generated video URL.
    new_chats = await client.list_chats()
    # ... poll and download

🤖 Generated with Claude Code

…transient error)

When `generate_content` hits `Stream suspended (queueing=True)`, the server
has accepted the request and is processing it asynchronously (e.g. Veo video
rendering). The current code raises `APIError`, which the `@running(retry=5)`
decorator treats as a transient error and retries — but each retry sends a
**new request**, creating a new server-side job and burning an additional
daily quota slot.

Empirically observed: within 45 seconds the decorator fired 4 retries,
creating 4 independent Veo conversations visible in the web UI. Each
consumed a separate daily quota slot.

Fix: introduce `QueueingError(GeminiError)` and raise it instead of
`APIError` when `is_queueing=True`. Since the decorator only retries
`APIError` (not `GeminiError`), queueing errors now bubble up immediately
with zero retries. Callers can catch `QueueingError` and switch to a
poll-based flow (list_chats + read_chat) to retrieve the result.

Non-queueing stream suspensions (`is_queueing=False`) still raise `APIError`
and are retried as before — this preserves the existing recovery behaviour
for transient connection issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@luuquangvu
Copy link
Copy Markdown
Contributor

Gemini only actually receives the request and begins processing it when a CID appears, and when a CID is present, the system waits until a result is available. Therefore, the analysis in this PR is incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants