text2speech-py

Vietnamese Text-to-Speech system powered by Fish Speech v1.5 with voice cloning and multilingual support.

Features

Zero-shot & Few-shot TTS: Generate high-quality speech with just 10-30 seconds of reference audio
Multilingual Support: English, Chinese, Japanese, German, French, Arabic, Russian, Dutch, Italian, Portuguese, Vietnamese, and more
Cross-lingual Voice Cloning: Clone voices across different languages
Vietnamese Optimization: Special utilities and optimizations for Vietnamese language
High Quality: Top-ranked performance on TTS-Arena2 benchmark
Easy to Use: Simple WebUI and REST API

Quick Start

Prerequisites

Python 3.10 or higher
8GB RAM minimum
5GB disk space for models and dependencies
macOS (with MPS), Linux, or Windows (WSL)

Installation

Clone the repository:

git clone https://github.com/yourusername/text2speech-py.git
cd text2speech-py

Create and activate virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Verify setup:
```
./verify_setup.sh
```

Usage

WebUI (Recommended for beginners)

./start_webui.sh

Then open your browser to http://localhost:7860

API Server

./start_api.sh

API will be available at http://localhost:8000

Vietnamese TTS Helper

This project includes special utilities for Vietnamese language optimization:

from vietnamese_tts_helper import VietnameseTTSHelper

helper = VietnameseTTSHelper()

# Prepare Vietnamese text
text = "Xin chào, đây là hệ thống chuyển đổi văn bản thành giọng nói."
prepared = helper.prepare_vietnamese_text(text)

# Split long text
chunks = helper.split_long_text(prepared, max_length=150)

# Get recommended parameters
params = helper.get_recommended_parameters()

See VIETNAMESE_GUIDE.md for detailed Vietnamese usage guide.

API Examples

Basic TTS

import requests

response = requests.post(
    "http://localhost:8000/v1/tts",
    json={
        "text": "Hello, this is a test.",
        "language": "en"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Voice Cloning

import requests

files = {"reference_audio": open("voice_sample.wav", "rb")}
data = {
    "text": "This will sound like the reference voice.",
    "language": "en"
}

response = requests.post(
    "http://localhost:8000/v1/tts",
    files=files,
    data=data
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Best Practices for Voice Cloning

Reference Audio Quality

Format: WAV (16-bit, 44.1kHz) or high-quality MP3 (320kbps)
Duration: 10-30 seconds (15-20 seconds optimal)
Content: Natural speech, avoid monotone reading
Environment: Quiet room, minimal background noise
Microphone distance: 15-20cm from mouth

Cross-lingual Cloning

Upload reference audio in any language
Enter text in target language
The model will maintain voice characteristics while speaking in the new language

Device Benchmarking

Test different devices to find the fastest for your system:

python benchmark_devices.py

This will test CPU and MPS (if available) and recommend the optimal device.

Project Structure

.
├── benchmark_devices.py      # Device performance testing
├── vietnamese_tts_helper.py  # Vietnamese language utilities
├── examples.py               # Usage examples
├── test_tts.py              # Installation test script
├── start_webui.sh           # WebUI launcher
├── start_api.sh             # API server launcher
├── fish-speech/             # Fish Speech engine
│   ├── checkpoints/         # Model files
│   └── tools/               # Utilities
└── .venv/                   # Virtual environment

Development

Setup Development Environment

# Install development dependencies
pip install -r requirements-dev.txt

# Run code quality checks
ruff format .
ruff check .
mypy *.py

# Run tests
pytest

See CONTRIBUTING.md for detailed development guidelines.

Troubleshooting

WebUI won't start

# Check setup
./verify_setup.sh

# View detailed logs
cd fish-speech
source ../.venv/bin/activate
python tools/run_webui.py --llama-checkpoint-path checkpoints/fish-speech-1.5

Model not found error

Check if model files exist in fish-speech/checkpoints/fish-speech-1.5/. If not, download them:

cd fish-speech
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='fishaudio/fish-speech-1.5', local_dir='checkpoints/fish-speech-1.5')"

Port already in use

Edit the startup script to change the port, or stop the process using that port.

Documentation

Vietnamese Guide - Hướng dẫn tiếng Việt
Vietnamese Optimization - Tối ưu cho tiếng Việt
Contributing Guidelines - How to contribute
Changelog - Version history

Resources

License

This project is licensed under the MIT License - see the LICENSE file for details.

Fish Speech components:

Code: Apache License 2.0
Models: CC-BY-NC-SA-4.0 License

Acknowledgments

Fish Audio for the Fish Speech engine
All contributors who help improve this project

Support

Open an issue for bug reports
Check discussions for questions
See CONTRIBUTING.md for contribution guidelines

Made with care for Vietnamese TTS applications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text2speech-py

Features

Quick Start

Prerequisites

Installation

Usage

WebUI (Recommended for beginners)

API Server

Vietnamese TTS Helper

API Examples

Basic TTS

Voice Cloning

Best Practices for Voice Cloning

Reference Audio Quality

Cross-lingual Cloning

Device Benchmarking

Project Structure

Development

Setup Development Environment

Troubleshooting

WebUI won't start

Model not found error

Port already in use

Documentation

Resources

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fish-speech		fish-speech
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SETUP_COMPLETE.txt		SETUP_COMPLETE.txt
VIETNAMESE_GUIDE.md		VIETNAMESE_GUIDE.md
VIETNAMESE_OPTIMIZATION.md		VIETNAMESE_OPTIMIZATION.md
benchmark_devices.py		benchmark_devices.py
examples.py		examples.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
start_api.sh		start_api.sh
start_webui.sh		start_webui.sh
start_webui_optimized.sh		start_webui_optimized.sh
test_tts.py		test_tts.py
verify_setup.sh		verify_setup.sh
vietnamese_tts_helper.py		vietnamese_tts_helper.py

Folders and files

Latest commit

History

Repository files navigation

text2speech-py

Features

Quick Start

Prerequisites

Installation

Usage

WebUI (Recommended for beginners)

API Server

Vietnamese TTS Helper

API Examples

Basic TTS

Voice Cloning

Best Practices for Voice Cloning

Reference Audio Quality

Cross-lingual Cloning

Device Benchmarking

Project Structure

Development

Setup Development Environment

Troubleshooting

WebUI won't start

Model not found error

Port already in use

Documentation

Resources

License

Acknowledgments

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages