Benchmark: Long-TTS-Eval Dataset | Task Config Evaluation Date: 2025/11/27 Paper: Long-TTS-Eval: A Benchmark for Evaluating Long-form Text-to-Speech
Metrics Legend:
- WER⬇️: Word Error Rate (lower is better)
- CER⬇️: Character Error Rate (lower is better)
Note: Performance format: reproduced_result(official_result) - values in parentheses are official results from the paper.
| task | sub | measure | performance | eval_cli | note |
|---|---|---|---|---|---|
| tts | long_tts_eval_zh | cer⬇️ | 7.23(5.58) | [1] | |
| tts | long_tts_eval_en | wer⬇️ | 4.69(4.98) | [2] | |
| tts | long_tts_eval_hard_zh | cer⬇️ | 24.33(23.58) | [3] | JIA-Lab-research/MGM-Omni#6 |
| tts | long_tts_eval_hard_en | wer⬇️ | 32.84(26.26) | [4] |
[1] python audio_evals/main.py --dataset long_tts_eval_zh --model mgm-omni-tts-zh
[2] python audio_evals/main.py --dataset long_tts_eval_en --model mgm-omni-tts
[3] python audio_evals/main.py --dataset long_tts_eval_hard_zh --model mgm-omni-tts-zh
[4] python audio_evals/main.py --dataset long_tts_eval_hard_en --model mgm-omni-tts