|
| 1 | +# FireRedASR |
| 2 | + |
| 3 | +[[Blog]](https://fireredteam.github.io/demos/firered_asr/) |
| 4 | +[[Paper]]() |
| 5 | +[[Model]](https://huggingface.co/fireredteam) |
| 6 | + |
| 7 | +FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: |
| 8 | +- FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. |
| 9 | +- FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +## News |
| 15 | +- [2025/01/24] 🔥 We release [techincal report]()(under review at arXiv), [blog](https://fireredteam.github.io/demos/firered_asr/), and [FireRedASR-AED-L](https://huggingface.co/fireredteam/FireRedASR-AED-L/tree/main) model weights. |
| 16 | +- [WIP] We plan to release FireRedASR-LLM-L after the Spring Festival. |
| 17 | + |
| 18 | +## Setup |
| 19 | + |
| 20 | +```bash |
| 21 | +$ git clone https://github.com/FireRedTeam/FireRedASR.git |
| 22 | +$ conda create --name fireredasr python=3.10 |
| 23 | +$ pip install -r requirements.txt |
| 24 | +``` |
| 25 | + |
| 26 | +## Usage |
| 27 | +Download model files from [huggingface](https://huggingface.co/fireredteam) and place them in the folder `pretrained_models` |
| 28 | + |
| 29 | +### Quick Start |
| 30 | +```bash |
| 31 | +$ cd examples/ |
| 32 | +$ bash inference_fireredasr_aed.sh |
| 33 | +$ bash inference_fireredasr_llm.sh |
| 34 | +``` |
| 35 | + |
| 36 | +### Commond-line Usage |
| 37 | +```bash |
| 38 | +# Setup PATH & PYTHONPATH |
| 39 | +$ export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH |
| 40 | +$ export PYTHONPATH=$PWD/:$PYTHONPATH |
| 41 | +$ speech2text.py --help |
| 42 | +$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "aed" --model_dir pretrained_models/FireRedASR-AED-L |
| 43 | +$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L |
| 44 | +``` |
| 45 | + |
| 46 | +### Python Usage |
| 47 | +```python |
| 48 | +from fireredasr.models.fireredasr import FireRedAsr |
| 49 | + |
| 50 | +batch_uttid = ["BAC009S0764W0121"] |
| 51 | +batch_wav_path = ["examples/wav/BAC009S0764W0121.wav"] |
| 52 | + |
| 53 | +# FireRedASR-AED |
| 54 | +model = FireRedAsr.from_pretrained("aed", "pretrained_models/FireRedASR-AED-L") |
| 55 | +results = model.transcribe( |
| 56 | + batch_uttid, |
| 57 | + batch_wav_path, |
| 58 | + { |
| 59 | + "use_gpu": 1, |
| 60 | + "beam_size": 3, |
| 61 | + "nbest": 1, |
| 62 | + "decode_max_len": 0, |
| 63 | + "softmax_smoothing": 1.0, |
| 64 | + "aed_length_penalty": 0.0, |
| 65 | + "eos_penalty": 1.0 |
| 66 | + } |
| 67 | +) |
| 68 | +print(results) |
| 69 | + |
| 70 | + |
| 71 | +# FireRedASR-LLM |
| 72 | +model = FireRedAsr.from_pretrained("llm", "pretrained_models/FireRedASR-LLM-L") |
| 73 | +results = model.transcribe( |
| 74 | + batch_uttid, |
| 75 | + batch_wav_path, |
| 76 | + { |
| 77 | + "use_gpu": 1, |
| 78 | + "beam_size": 3, |
| 79 | + "decode_max_len": 0, |
| 80 | + "decode_min_len": 0, |
| 81 | + "repetition_penalty": 1.0, |
| 82 | + "llm_length_penalty": 0.0, |
| 83 | + "temperature": 1.0 |
| 84 | + } |
| 85 | +) |
| 86 | +print(results) |
| 87 | +``` |
| 88 | + |
| 89 | + |
| 90 | +## Acknowledgements |
| 91 | +Thanks to the following open-source works: |
| 92 | +- [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) |
| 93 | +- [icefall/ASR_LLM](https://github.com/k2-fsa/icefall/tree/master/egs/speech_llm/ASR_LLM) |
| 94 | +- [WeNet](https://github.com/wenet-e2e/wenet) |
| 95 | +- [Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer) |
0 commit comments