Skip to content

feat: upgrade OpenAudio backend to Fish Speech S2-Pro and refactor de…#29

Open
KevinBonnoron wants to merge 1 commit intomainfrom
28-feat-upgrade-openaudio-backend-to-fish-speech-s2-pro
Open

feat: upgrade OpenAudio backend to Fish Speech S2-Pro and refactor de…#29
KevinBonnoron wants to merge 1 commit intomainfrom
28-feat-upgrade-openaudio-backend-to-fish-speech-s2-pro

Conversation

@KevinBonnoron
Copy link
Copy Markdown
Owner

@KevinBonnoron KevinBonnoron commented Mar 20, 2026

…pendencies

Summary by CodeRabbit

  • New Features

    • Fish Speech S2-Pro model now available for text-to-speech synthesis, featuring voice cloning capability, audio effects support, and optimized performance with updated dependencies.
  • Removals

    • OpenAudio S1 Mini model has been discontinued and replaced.

@KevinBonnoron KevinBonnoron linked an issue Mar 20, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Walkthrough

The PR replaces the OpenAudio TTS backend with a new Fish Audio backend implementation. Changes include: removing the openaudio backend class and installation function while adding a FishAudioBackend class with streaming support; updating the dependency registry to use "fish_audio" key with pinned torch/torchaudio versions (2.8.0) and installing fish-speech from v2.0.0-beta git revision; removing OpenAudio S1 Mini model from the catalog and adding Fish Speech S2-Pro model; and updating the backend registry imports to reference the new backend.

Possibly related issues

  • feat: upgrade OpenAudio backend to Fish Speech S2-Pro #28: Implements the S2-era Fish Speech inference path with dependency changes, new fish_audio backend, updated fish-speech installation, and modified generate/init flow that directly align with this PR's backend replacement and upgrade.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: upgrade OpenAudio backend to Fish Speech S2-Pro and refactor de…' clearly describes the main change—upgrading from OpenAudio to Fish Speech S2-Pro and refactoring dependencies, which aligns with the changeset replacing the openaudio backend with fish_audio.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use oxc to improve the quality of JavaScript and TypeScript code reviews.

Add a configuration file to your project to customize how CodeRabbit runs oxc.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ac30a16c-ddef-4222-8c79-71cbd6762ee4

📥 Commits

Reviewing files that changed from the base of the PR and between 49004d8 and d56a7dd.

📒 Files selected for processing (5)
  • inference/src/backends/deps.py
  • inference/src/backends/fish_audio.py
  • inference/src/backends/openaudio.py
  • inference/src/backends/registry.py
  • server/src/manifest/models.manifest.ts
💤 Files with no reviewable changes (1)
  • inference/src/backends/openaudio.py


_TORCH_CPU_INDEX = "https://download.pytorch.org/whl/cpu"
_TORCH = ["torch>=2.5.0,<2.8", "torchaudio>=2.5.0,<2.8"]
_TORCH_28 = ["torch==2.8.0", "torchaudio==2.8.0"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if fish-speech v2.0.0-beta actually requires torch 2.8.0
curl -s "https://raw.githubusercontent.com/fishaudio/fish-speech/v2.0.0-beta/pyproject.toml" 2>/dev/null | grep -A 10 "dependencies" || \
curl -s "https://raw.githubusercontent.com/fishaudio/fish-speech/v2.0.0-beta/setup.py" 2>/dev/null | grep -A 10 "install_requires"

Repository: KevinBonnoron/sirene

Length of output: 467


🏁 Script executed:

cat -n inference/src/backends/deps.py | head -100

Repository: KevinBonnoron/sirene

Length of output: 3904


🏁 Script executed:

cat -n inference/src/backends/deps.py | tail -100

Repository: KevinBonnoron/sirene

Length of output: 4333


🏁 Script executed:

# Search for how install_backend_deps is called
rg "install_backend_deps" --type py -B 2 -A 2

Repository: KevinBonnoron/sirene

Length of output: 2877


🏁 Script executed:

# Search for README or documentation about backends
fd -e md -e txt | grep -i readme

Repository: KevinBonnoron/sirene

Length of output: 73


🏁 Script executed:

head -150 README.md

Repository: KevinBonnoron/sirene

Length of output: 2720


PyTorch version constraint conflict with other backends.

The _TORCH_28 constant pins torch==2.8.0, which violates the torch>=2.5.0,<2.8 constraint used by other backends (qwen, f5-tts, cosyvoice, chatterbox, higgs_audio). Installing multiple backends in the same environment will fail with a pip resolver error since torch==2.8.0 does not satisfy <2.8.

Either document that fish_audio cannot coexist with other torch-based backends, or adjust the constraint to allow flexibility (e.g., torch>=2.8.0,<3.0 for all backends if compatible).

Comment on lines +69 to +71
packages=[
*_TORCH_28,
"transformers<=4.57.3",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Unusual transformers version constraint.

The constraint transformers<=4.57.3 only sets an upper bound without a lower bound, which is atypical. Other backends in this file use >=4.47.0. This could allow installation of very old transformers versions. Consider adding a lower bound:

-            "transformers<=4.57.3",
+            "transformers>=4.47.0,<=4.57.3",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
packages=[
*_TORCH_28,
"transformers<=4.57.3",
packages=[
*_TORCH_28,
"transformers>=4.47.0,<=4.57.3",

Comment on lines +106 to +107
if not codes_list:
raise RuntimeError("OpenAudio LLM generated no tokens")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect backend name in error message.

The error message references "OpenAudio LLM" but this is the Fish Audio backend. This appears to be a copy-paste artifact from the removed OpenAudio implementation.

📝 Proposed fix
         if not codes_list:
-            raise RuntimeError("OpenAudio LLM generated no tokens")
+            raise RuntimeError("Fish Audio model generated no tokens")
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 107-107: Avoid specifying long messages outside the exception class

(TRY003)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: upgrade OpenAudio backend to Fish Speech S2-Pro

1 participant