Skip to content

Voice blending support for Kokoro TTS — via KPipeline backend? #614

@will-assistant

Description

@will-assistant

Hey, love the project — Speaches is hands down the cleanest self-hosted OpenAI-compatible speech server out there. Been running it and it's solid.

Feature request: Weighted voice blending for Kokoro TTS, e.g.:

{ "voice": "af_sarah(1)+am_adam(1)+am_onyx(0.5)" }

This is a well-established technique in the Kokoro ecosystem — weighted averaging of style vectors to create custom voice personas without any training. Several projects already implement it:

  • Kokoro-FastAPIvoice1(weight)+voice2(weight) syntax, most widely adopted. Worth noting it supports both CPU (ONNX) and GPU (PyTorch) paths, so the lightweight CPU argument isn't exclusive to kokoro-onnx.
  • RealtimeTTS — formula-based blended voice cache using KPipeline
  • kokoro-tts CLIvoice1:60,voice2:40 syntax
  • Community experimentation — voice extrapolation and interpolation via linear models

The interesting design question: these implementations all use the official kokoro PyTorch package (KPipeline from hexgrad) rather than kokoro-onnx. Blending is trivial with PyTorch tensors — it's just weighted averaging before synthesis.

I understand Speaches uses kokoro-onnx for good reason (lighter footprint, ARM compatibility). A few possible paths forward:

  1. Add blending to the existing ONNX path — load voice arrays from the npz file, weighted average with numpy, pass the blended array to kokoro-onnx (may need kokoro-onnx to accept raw arrays)
  2. Add a KPipeline executor as an optional backend — for GPU users who want blending + native PyTorch performance, alongside the existing ONNX executor for CPU/lightweight deployments. Kokoro-FastAPI already proves this dual CPU/GPU approach works well in production.
  3. Something else entirely — you know the codebase better than anyone

Would love to hear your thoughts on the right approach. Happy to contribute a PR if there's a direction you'd prefer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions