Skip to content

TeguFy/text2speech-fish-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text2speech-py

Python 3.10+ License: MIT Code style: ruff

Vietnamese Text-to-Speech system powered by Fish Speech v1.5 with voice cloning and multilingual support.

Features

  • Zero-shot & Few-shot TTS: Generate high-quality speech with just 10-30 seconds of reference audio
  • Multilingual Support: English, Chinese, Japanese, German, French, Arabic, Russian, Dutch, Italian, Portuguese, Vietnamese, and more
  • Cross-lingual Voice Cloning: Clone voices across different languages
  • Vietnamese Optimization: Special utilities and optimizations for Vietnamese language
  • High Quality: Top-ranked performance on TTS-Arena2 benchmark
  • Easy to Use: Simple WebUI and REST API

Quick Start

Prerequisites

  • Python 3.10 or higher
  • 8GB RAM minimum
  • 5GB disk space for models and dependencies
  • macOS (with MPS), Linux, or Windows (WSL)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/text2speech-py.git
    cd text2speech-py
  2. Create and activate virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Verify setup:

    ./verify_setup.sh

Usage

WebUI (Recommended for beginners)

./start_webui.sh

Then open your browser to http://localhost:7860

API Server

./start_api.sh

API will be available at http://localhost:8000

Vietnamese TTS Helper

This project includes special utilities for Vietnamese language optimization:

from vietnamese_tts_helper import VietnameseTTSHelper

helper = VietnameseTTSHelper()

# Prepare Vietnamese text
text = "Xin chào, đây là hệ thống chuyển đổi văn bản thành giọng nói."
prepared = helper.prepare_vietnamese_text(text)

# Split long text
chunks = helper.split_long_text(prepared, max_length=150)

# Get recommended parameters
params = helper.get_recommended_parameters()

See VIETNAMESE_GUIDE.md for detailed Vietnamese usage guide.

API Examples

Basic TTS

import requests

response = requests.post(
    "http://localhost:8000/v1/tts",
    json={
        "text": "Hello, this is a test.",
        "language": "en"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Voice Cloning

import requests

files = {"reference_audio": open("voice_sample.wav", "rb")}
data = {
    "text": "This will sound like the reference voice.",
    "language": "en"
}

response = requests.post(
    "http://localhost:8000/v1/tts",
    files=files,
    data=data
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Best Practices for Voice Cloning

Reference Audio Quality

  • Format: WAV (16-bit, 44.1kHz) or high-quality MP3 (320kbps)
  • Duration: 10-30 seconds (15-20 seconds optimal)
  • Content: Natural speech, avoid monotone reading
  • Environment: Quiet room, minimal background noise
  • Microphone distance: 15-20cm from mouth

Cross-lingual Cloning

  1. Upload reference audio in any language
  2. Enter text in target language
  3. The model will maintain voice characteristics while speaking in the new language

Device Benchmarking

Test different devices to find the fastest for your system:

python benchmark_devices.py

This will test CPU and MPS (if available) and recommend the optimal device.

Project Structure

.
├── benchmark_devices.py      # Device performance testing
├── vietnamese_tts_helper.py  # Vietnamese language utilities
├── examples.py               # Usage examples
├── test_tts.py              # Installation test script
├── start_webui.sh           # WebUI launcher
├── start_api.sh             # API server launcher
├── fish-speech/             # Fish Speech engine
│   ├── checkpoints/         # Model files
│   └── tools/               # Utilities
└── .venv/                   # Virtual environment

Development

Setup Development Environment

# Install development dependencies
pip install -r requirements-dev.txt

# Run code quality checks
ruff format .
ruff check .
mypy *.py

# Run tests
pytest

See CONTRIBUTING.md for detailed development guidelines.

Troubleshooting

WebUI won't start

# Check setup
./verify_setup.sh

# View detailed logs
cd fish-speech
source ../.venv/bin/activate
python tools/run_webui.py --llama-checkpoint-path checkpoints/fish-speech-1.5

Model not found error

Check if model files exist in fish-speech/checkpoints/fish-speech-1.5/. If not, download them:

cd fish-speech
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='fishaudio/fish-speech-1.5', local_dir='checkpoints/fish-speech-1.5')"

Port already in use

Edit the startup script to change the port, or stop the process using that port.

Documentation

Resources

License

This project is licensed under the MIT License - see the LICENSE file for details.

Fish Speech components:

  • Code: Apache License 2.0
  • Models: CC-BY-NC-SA-4.0 License

Acknowledgments

  • Fish Audio for the Fish Speech engine
  • All contributors who help improve this project

Support


Made with care for Vietnamese TTS applications

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors