Add Cohere Transcribe STT support by beshkenadze · Pull Request #129 · Blaizzy/mlx-audio-swift

beshkenadze · 2026-03-27T16:46:36Z

Summary

add Cohere Transcribe model support, CLI wiring, and Cohere-specific tests
fix the Swift decoder prompt token order to match the canonical Cohere/Python layout and restore punctuation/casing on transcription output
add Swift-side support for loading quantized Cohere checkpoints via quantization / quantization_config
publish quantized Cohere MLX repos for 8-bit, 6-bit, and 4-bit variants

Quantized model repos

Root causes addressed

1. Text quality regression

Cohere Transcribe uses prompt control tokens to steer formatting. The Swift tokenizer originally built the decoder prompt in the wrong order:

<|startofcontext|> <|startoftranscript|> <|en|> <|en|> <|pnc|> <|notimestamp|> <|nodiarize|> <|noitn|> <|emo:undefined|>

The canonical order used by the Python implementation is:

<|startofcontext|> <|startoftranscript|> <|emo:undefined|> <|en|> <|en|> <|pnc|> <|noitn|> <|notimestamp|> <|nodiarize|>

That mismatch preserved lexical content but degraded casing and punctuation. The Swift tokenizer now matches the working Python path.

2. Quantized Cohere checkpoints were not loadable in Swift

Unlike the other STT models in this repo, Cohere did not decode quantization / quantization_config from config.json and did not call quantize(model: ...) before loading packed weights. This PR adds the same quantization-aware loading path used by the other Swift STT models.

It also tightens Cohere conv-weight normalization so both the original converted fp16 checkpoint and the locally re-saved quantized checkpoints load with the correct 1D convolution layout.

Files changed for quantization support

Sources/MLXAudioSTT/Models/CohereTranscribe/CohereTranscribe.swift
Sources/MLXAudioSTT/Models/CohereTranscribe/CohereTranscribeConfig.swift
Tests/MLXAudioSTTTests.swift

Validation

swift test --filter cohereConfigDecoding
swift test --filter cohereTokenizerBuildsPromptTokens
xcodebuild -scheme mlx-audio-swift-stt -configuration Release -destination "platform=macOS" -derivedDataPath .build/xcode build
real transcription benchmark on Tests/media/conversational_a.wav
local quantized export + load verification for 8-bit / 6-bit / 4-bit Cohere checkpoints

Benchmark (`Tests/media/conversational_a.wav`, warm run)

Variant	Gen TPS	Total time	Peak memory	Quality note
fp16	146.6	0.764s	5.40 GB	baseline
8bit	352.9	0.460s	2.87 GB	matches fp16 on this sample
6bit	362.5	0.461s	2.42 GB	punctuation regression: `bush-curious`
4bit	394.6	0.436s	1.96 GB	lexical regression: `Kaldi` → `Khaldi`

Recommendation

8-bit is the best quantized trade-off on the repo sample: ~2.4x fp16 generation throughput with ~47% lower peak memory and no observed text regression on this sample.
6-bit and 4-bit are faster/smaller, but both show output degradation on the same audio clip, so they should be considered more experimental.

Notes

this PR remains a draft because broader Cohere evaluation is still ongoing even though the repo-sample text-quality issue is fixed
the quantized repos are uploaded and linked above so reviewers can pull the exact artifacts used in this benchmark

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

beshkenadze · 2026-03-27T16:52:14Z

@lucasnewman I'm not sure about the quality of the model itself, but we can keep it if we want to support

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

beshkenadze · 2026-03-29T19:07:59Z

@lucasnewman now we have a full parity with py version :)

beshkenadze · 2026-03-29T20:59:35Z

Also closes #130

Newarr · 2026-03-30T08:23:25Z

We're looking to use this in OpenOats (macOS meeting transcription). Ran Cohere against Whisper Large v3 Turbo on Apple Silicon via the Python mlx-audio side.

Started with 8 samples where Cohere hit 0.0% WER on French and Spanish. Kept adding samples to see if that held. It didn't at scale, but neither did Whisper's early leads. At 695 samples (647 English):

	n	Cohere avg WER	Whisper avg WER	Cohere median	Whisper median
English	647	5.55%	5.56%	4.00%	3.57%
Polish	22	7.3%	7.3%	4.3%	4.3%
Spanish	22	2.5%	1.8%	0.0%	0.0%
French	2	0.0%	1.6%	-	-
German	2	1.6%	7.8%	-	-
Avg latency	-	0.23s	0.47s	-	-

On English they're the same model to two decimal places. Cohere is 2x faster, and that held across every test we ran.

Can help test on meeting audio once this lands.

Benjoyo · 2026-03-30T08:56:52Z

Thanks @beshkenadze for this - can we use a 8 or 4bit quantized model as well? I only find your fp16 model on the Hub (https://huggingface.co/beshkenadze/cohere-transcribe-03-2026-mlx-fp16). Does quantization work, and if yes, can you point me at the right script/command? Thanks

beshkenadze · 2026-03-30T10:52:20Z

@Benjoyo I've uploaded quantized models and updated the benchmark results as well.

- Add Cohere Transcribe STT model implementation - Wire into CLI and docs - Add Cohere Transcribe tests - Fix: use max(dim-1, 1) in ParakeetAudio normalization (div-by-zero guard) - Fix: add textProcessor param and kokoro case to TTSModel factory - Improve test integration via MLXAUDIO_TEST_MODEL_DIR env var

beshkenadze · 2026-03-31T10:34:29Z

@Benjoyo @Newarr feel free to use my fork while we are waiting for merge into upstream.

beshkenadze and others added 5 commits March 27, 2026 18:45

Add agent guidance for MLX metallib builds

8195c46

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

Add Cohere Transcribe model implementation

5f296a8

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

Wire Cohere Transcribe into CLI and docs

15659c3

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

Add Cohere Transcribe tests

018d172

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

Merge branch 'main' into draft/cohere-transcribe-experimental

a7ff8f0

beshkenadze and others added 3 commits March 27, 2026 19:03

Fix Cohere fixture weight normalization

ee2e20e

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>

fix: align Cohere prompt token order

d2a1441

Merge branch 'main' into draft/cohere-transcribe-experimental

d8d8aba

beshkenadze marked this pull request as ready for review March 29, 2026 19:07

Newarr mentioned this pull request Mar 30, 2026

Add Cohere Transcribe (cohere_asr) to MLXAudioSTT #134

Closed

feat: support quantized Cohere checkpoints

d216e3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Cohere Transcribe STT support#129

Add Cohere Transcribe STT support#129
beshkenadze wants to merge 9 commits intoBlaizzy:mainfrom
beshkenadze:draft/cohere-transcribe-experimental

beshkenadze commented Mar 27, 2026 •

edited

Loading

Uh oh!

beshkenadze commented Mar 27, 2026

Uh oh!

beshkenadze commented Mar 29, 2026

Uh oh!

beshkenadze commented Mar 29, 2026

Uh oh!

Newarr commented Mar 30, 2026 •

edited

Loading

Uh oh!

Benjoyo commented Mar 30, 2026

Uh oh!

beshkenadze commented Mar 30, 2026

Uh oh!

beshkenadze commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

beshkenadze commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Quantized model repos

Root causes addressed

1. Text quality regression

2. Quantized Cohere checkpoints were not loadable in Swift

Files changed for quantization support

Validation

Benchmark (Tests/media/conversational_a.wav, warm run)

Recommendation

Notes

Uh oh!

beshkenadze commented Mar 27, 2026

Uh oh!

beshkenadze commented Mar 29, 2026

Uh oh!

beshkenadze commented Mar 29, 2026

Uh oh!

Newarr commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Benjoyo commented Mar 30, 2026

Uh oh!

beshkenadze commented Mar 30, 2026

Uh oh!

beshkenadze commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

beshkenadze commented Mar 27, 2026 •

edited

Loading

Benchmark (`Tests/media/conversational_a.wav`, warm run)

Newarr commented Mar 30, 2026 •

edited

Loading