Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>
|
@lucasnewman I'm not sure about the quality of the model itself, but we can keep it if we want to support |
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <[email protected]>
|
@lucasnewman now we have a full parity with py version :) |
|
Also closes #130 |
|
We're looking to use this in OpenOats (macOS meeting transcription). Ran Cohere against Whisper Large v3 Turbo on Apple Silicon via the Python mlx-audio side. Started with 8 samples where Cohere hit 0.0% WER on French and Spanish. Kept adding samples to see if that held. It didn't at scale, but neither did Whisper's early leads. At 695 samples (647 English):
On English they're the same model to two decimal places. Cohere is 2x faster, and that held across every test we ran. Can help test on meeting audio once this lands. |
|
Thanks @beshkenadze for this - can we use a 8 or 4bit quantized model as well? I only find your fp16 model on the Hub (https://huggingface.co/beshkenadze/cohere-transcribe-03-2026-mlx-fp16). Does quantization work, and if yes, can you point me at the right script/command? Thanks |
|
@Benjoyo I've uploaded quantized models and updated the benchmark results as well. |
- Add Cohere Transcribe STT model implementation - Wire into CLI and docs - Add Cohere Transcribe tests - Fix: use max(dim-1, 1) in ParakeetAudio normalization (div-by-zero guard) - Fix: add textProcessor param and kokoro case to TTSModel factory - Improve test integration via MLXAUDIO_TEST_MODEL_DIR env var
Summary
quantization/quantization_configQuantized model repos
beshkenadze/cohere-transcribe-03-2026-mlx-fp16beshkenadze/cohere-transcribe-03-2026-mlx-8bitbeshkenadze/cohere-transcribe-03-2026-mlx-6bitbeshkenadze/cohere-transcribe-03-2026-mlx-4bitRoot causes addressed
1. Text quality regression
Cohere Transcribe uses prompt control tokens to steer formatting. The Swift tokenizer originally built the decoder prompt in the wrong order:
The canonical order used by the Python implementation is:
That mismatch preserved lexical content but degraded casing and punctuation. The Swift tokenizer now matches the working Python path.
2. Quantized Cohere checkpoints were not loadable in Swift
Unlike the other STT models in this repo, Cohere did not decode
quantization/quantization_configfromconfig.jsonand did not callquantize(model: ...)before loading packed weights. This PR adds the same quantization-aware loading path used by the other Swift STT models.It also tightens Cohere conv-weight normalization so both the original converted fp16 checkpoint and the locally re-saved quantized checkpoints load with the correct 1D convolution layout.
Files changed for quantization support
Sources/MLXAudioSTT/Models/CohereTranscribe/CohereTranscribe.swiftSources/MLXAudioSTT/Models/CohereTranscribe/CohereTranscribeConfig.swiftTests/MLXAudioSTTTests.swiftValidation
swift test --filter cohereConfigDecodingswift test --filter cohereTokenizerBuildsPromptTokensxcodebuild -scheme mlx-audio-swift-stt -configuration Release -destination "platform=macOS" -derivedDataPath .build/xcode buildTests/media/conversational_a.wavBenchmark (
Tests/media/conversational_a.wav, warm run)bush-curiousKaldi→KhaldiRecommendation
Notes