Skip to content

Commit 5bf83d8

Browse files
author
FireRedTeam
committed
init
0 parents  commit 5bf83d8

27 files changed

Lines changed: 2019 additions & 0 deletions

README.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# FireRedASR
2+
3+
[[Blog]](https://fireredteam.github.io/demos/firered_asr/)
4+
[[Paper]]()
5+
[[Model]](https://huggingface.co/fireredteam)
6+
7+
FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants:
8+
- FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities.
9+
- FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture.
10+
11+
![Model](/assets/FireRedASR_model.png)
12+
13+
14+
## News
15+
- [2025/01/24] 🔥 We release [techincal report]()(under review at arXiv), [blog](https://fireredteam.github.io/demos/firered_asr/), and [FireRedASR-AED-L](https://huggingface.co/fireredteam/FireRedASR-AED-L/tree/main) model weights.
16+
- [WIP] We plan to release FireRedASR-LLM-L after the Spring Festival.
17+
18+
## Setup
19+
20+
```bash
21+
$ git clone https://github.com/FireRedTeam/FireRedASR.git
22+
$ conda create --name fireredasr python=3.10
23+
$ pip install -r requirements.txt
24+
```
25+
26+
## Usage
27+
Download model files from [huggingface](https://huggingface.co/fireredteam) and place them in the folder `pretrained_models`
28+
29+
### Quick Start
30+
```bash
31+
$ cd examples/
32+
$ bash inference_fireredasr_aed.sh
33+
$ bash inference_fireredasr_llm.sh
34+
```
35+
36+
### Commond-line Usage
37+
```bash
38+
# Setup PATH & PYTHONPATH
39+
$ export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH
40+
$ export PYTHONPATH=$PWD/:$PYTHONPATH
41+
$ speech2text.py --help
42+
$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "aed" --model_dir pretrained_models/FireRedASR-AED-L
43+
$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L
44+
```
45+
46+
### Python Usage
47+
```python
48+
from fireredasr.models.fireredasr import FireRedAsr
49+
50+
batch_uttid = ["BAC009S0764W0121"]
51+
batch_wav_path = ["examples/wav/BAC009S0764W0121.wav"]
52+
53+
# FireRedASR-AED
54+
model = FireRedAsr.from_pretrained("aed", "pretrained_models/FireRedASR-AED-L")
55+
results = model.transcribe(
56+
batch_uttid,
57+
batch_wav_path,
58+
{
59+
"use_gpu": 1,
60+
"beam_size": 3,
61+
"nbest": 1,
62+
"decode_max_len": 0,
63+
"softmax_smoothing": 1.0,
64+
"aed_length_penalty": 0.0,
65+
"eos_penalty": 1.0
66+
}
67+
)
68+
print(results)
69+
70+
71+
# FireRedASR-LLM
72+
model = FireRedAsr.from_pretrained("llm", "pretrained_models/FireRedASR-LLM-L")
73+
results = model.transcribe(
74+
batch_uttid,
75+
batch_wav_path,
76+
{
77+
"use_gpu": 1,
78+
"beam_size": 3,
79+
"decode_max_len": 0,
80+
"decode_min_len": 0,
81+
"repetition_penalty": 1.0,
82+
"llm_length_penalty": 0.0,
83+
"temperature": 1.0
84+
}
85+
)
86+
print(results)
87+
```
88+
89+
90+
## Acknowledgements
91+
Thanks to the following open-source works:
92+
- [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct)
93+
- [icefall/ASR_LLM](https://github.com/k2-fsa/icefall/tree/master/egs/speech_llm/ASR_LLM)
94+
- [WeNet](https://github.com/wenet-e2e/wenet)
95+
- [Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer)

assets/FireRedASR_model.png

183 KB
Loading

examples/fireredasr

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../fireredasr
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/bin/bash
2+
3+
export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH
4+
export PYTHONPATH=$PWD/:$PYTHONPATH
5+
6+
# model_dir includes model.pth.tar, cmvn.ark, dict.txt
7+
model_dir=$PWD/pretrained_models/FireRedASR-AED-L
8+
9+
# Support several input format
10+
wavs="--wav_path wav/BAC009S0764W0121.wav"
11+
wavs="--wav_paths wav/BAC009S0764W0121.wav wav/IT0011W0001.wav wav/TEST_NET_Y0000000000_-KTKHdZ2fb8_S00000.wav wav/TEST_MEETING_T0000000001_S00000.wav"
12+
wavs="--wav_dir wav/"
13+
wavs="--wav_scp wav/wav.scp"
14+
15+
out="out/aed-l-asr.txt"
16+
17+
decode_args="
18+
--batch_size 2 --beam_size 3 --nbest 1
19+
--decode_max_len 0 --softmax_smoothing 1.25 --aed_length_penalty 0.6
20+
--eos_penalty 1.0
21+
"
22+
23+
mkdir -p $(dirname $out)
24+
set -x
25+
26+
27+
CUDA_VISIBLE_DEVICES=0 \
28+
speech2text.py --asr_type "aed" --model_dir $model_dir $decode_args $wavs --output $out
29+
30+
31+
ref="wav/text"
32+
wer.py --print_sentence_wer 1 --do_tn 0 --rm_special 0 --ref $ref --hyp $out > $out.wer 2>&1
33+
tail -n8 $out.wer
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/bin/bash
2+
3+
export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH
4+
export PYTHONPATH=$PWD/:$PYTHONPATH
5+
6+
# model_dir includes model.pth.tar, asr_encoder.pth.tar, cmvn.ark, Qwen2-7B-Instruct
7+
model_dir=$PWD/pretrained_models/FireRedASR-LLM-L
8+
9+
# Support several input format
10+
wavs="--wav_path wav/BAC009S0764W0121.wav"
11+
wavs="--wav_paths wav/BAC009S0764W0121.wav wav/IT0011W0001.wav wav/TEST_NET_Y0000000000_-KTKHdZ2fb8_S00000.wav wav/TEST_MEETING_T0000000001_S00000.wav"
12+
wavs="--wav_dir wav/"
13+
wavs="--wav_scp wav/wav.scp"
14+
15+
out="out/llm-l-asr.txt"
16+
17+
decode_args="
18+
--batch_size 1 --beam_size 3 --decode_max_len 0 --decode_min_len 0
19+
--repetition_penalty 3.0 --llm_length_penalty 1.0 --temperature 1.0
20+
"
21+
22+
mkdir -p $(dirname $out)
23+
set -x
24+
25+
26+
CUDA_VISIBLE_DEVICES=0 \
27+
speech2text.py --asr_type "llm" --model_dir $model_dir $decode_args $wavs --output $out
28+
29+
30+
ref="wav/text"
31+
wer.py --print_sentence_wer 1 --do_tn 0 --rm_special 1 --ref $ref --hyp $out > $out.wer 2>&1
32+
tail -n8 $out.wer

examples/pretrained_models

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../pretrained_models

examples/wav/BAC009S0764W0121.wav

131 KB
Binary file not shown.

examples/wav/IT0011W0001.wav

62.3 KB
Binary file not shown.
387 KB
Binary file not shown.
56.3 KB
Binary file not shown.

0 commit comments

Comments
 (0)