Skip to content

Commit a46bfb0

Browse files
author
FireRedTeam
committed
release FireRedASR-LLM-L and update README
1 parent 8dad8e2 commit a46bfb0

3 files changed

Lines changed: 15 additions & 7 deletions

File tree

README.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<br>
44
Automatic Speech Recognition Models</h1>
55

6-
Kai-Tuo Xu · Feng-Long Xie · Xu Tang · Yao Hu
6+
[Kai-Tuo Xu](https://github.com/kaituoxu) · [Feng-Long Xie](https://scholar.google.com/citations?user=bi8ExI4AAAAJ&hl=zh-CN&oi=sra) · [Xu Tang](https://scholar.google.com/citations?user=grP24aAAAAAJ&hl=zh-CN&oi=sra) · [Yao Hu](https://scholar.google.com/citations?user=LIu7k7wAAAAJ&hl=zh-CN)
77

88
</div>
99

@@ -55,6 +55,8 @@ Results are reported in Character Error Rate (CER%) for Chinese and Word Error R
5555
## Usage
5656
Download model files from [huggingface](https://huggingface.co/fireredteam) and place them in the folder `pretrained_models`.
5757

58+
If you want to use `FireRedASR-LLM-L`, you also need to download [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) and place it in the folder `pretrained_models`. Then, go to folder `FireRedASR-LLM-L` and run `$ ln -s ../Qwen2-7B-Instruct`
59+
5860

5961
### Setup
6062
Create a Python environment and install dependencies
@@ -77,7 +79,7 @@ ffmpeg -i input_audio -ar 16000 -ac 1 -acodec pcm_s16le -f wav output.wav
7779

7880
### Quick Start
7981
```bash
80-
$ cd examples/
82+
$ cd examples
8183
$ bash inference_fireredasr_aed.sh
8284
$ bash inference_fireredasr_llm.sh
8385
```
@@ -106,8 +108,8 @@ results = model.transcribe(
106108
"beam_size": 3,
107109
"nbest": 1,
108110
"decode_max_len": 0,
109-
"softmax_smoothing": 1.0,
110-
"aed_length_penalty": 0.0,
111+
"softmax_smoothing": 1.25,
112+
"aed_length_penalty": 0.6,
111113
"eos_penalty": 1.0
112114
}
113115
)
@@ -124,14 +126,18 @@ results = model.transcribe(
124126
"beam_size": 3,
125127
"decode_max_len": 0,
126128
"decode_min_len": 0,
127-
"repetition_penalty": 1.0,
128-
"llm_length_penalty": 0.0,
129+
"repetition_penalty": 3.0,
130+
"llm_length_penalty": 1.0,
129131
"temperature": 1.0
130132
}
131133
)
132134
print(results)
133135
```
134136

137+
## Usage Tips
138+
### Batch Beam Search
139+
- When performing batch beam search with FireRedASR-LLM, please ensure that the input lengths of the utterances are similar. If there are significant differences in utterance lengths, shorter utterances may experience repetition issues. You can either sort your dataset by length or set `batch_size` to 1 to avoid the repetition issue.
140+
135141
### Input Length Limitations
136142
- FireRedASR-AED supports audio input up to 60s. Input longer than 60s may cause hallucination issues, and input exceeding 200s will trigger positional encoding errors.
137143
- FireRedASR-LLM supports audio input up to 30s. The behavior for longer input is currently unknown.

fireredasr/models/fireredasr_llm.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ def load_encoder(cls, model_path):
2020
assert os.path.exists(model_path)
2121
package = torch.load(model_path, map_location=lambda storage, loc: storage)
2222
model = FireRedAsrAed.from_args(package["args"])
23-
model.load_state_dict(package["model_state_dict"], strict=False)
23+
if "model_state_dict" in package:
24+
model.load_state_dict(package["model_state_dict"], strict=False)
2425
encoder = model.encoder
2526
encoder_dim = encoder.odim
2627
return encoder, encoder_dim

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,6 @@ kaldiio>=2.18.0
33
kaldi_native_fbank>=1.15
44
numpy>=1.26.1
55
peft>=0.13.2
6+
sentencepiece
67
torch>=2.0.0
78
transformers>=4.46.3

0 commit comments

Comments
 (0)