A small FastAPI service that wraps Qwen3-TTS (CustomVoice) behind a simple HTTP API for WAV generation.
Built as a companion module for InfiniteBook.
Report Bug
·
Request Feature
Qwen_TTS_Api is a lightweight HTTP service to synthesize narration/dialog audio as a single WAV file from a list of spans (narr/dialog/pause).
It exists so InfiniteBook can use Qwen3-TTS like any other TTS provider without embedding heavy GPU model code into the web app process.
- Single request “chapter render”: send spans → receive one WAV (server handles batching + stitching).
- Model lifecycle endpoints: load/unload + state check.
- Docker-first deployment so the main app stays simple.
- NVIDIA GPU + drivers (recommended).
- Docker (and NVIDIA Container Toolkit if you want
--gpus all).
git clone https://github.com/KaMeLoTmArMoT/Qwen_TTS_Api.git
cd Qwen_TTS_Apidocker build -t qwen-tts-api .Expose the API on port 8001 (pick any host port you want):
docker run --rm --gpus all -p 8001:8001 qwen-tts-apiOptional (if you want to preload a specific model at startup, depending on how you wired the container):
docker run --rm --gpus all -p 8001:8001 \
-e QWEN_MODEL_ID="Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice" \
qwen-tts-apiIn InfiniteBook, select the provider and point it at the container URL. github
IB_TTS_PROVIDER=qwen
IB_QWEN_TTS_URL=http://127.0.0.1:8001
IB_QWEN_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoiceThis project is licensed under the MIT License — see LICENSE.
- README structure inspired by Best-README-Template.
- Built with assistance from generative AI tools for ideation and code suggestions; all changes were reviewed and tested by the author.
