feat: Add ModelsLab text-to-speech provider#5165
feat: Add ModelsLab text-to-speech provider#5165adhikjoshi wants to merge 3 commits intoMintplex-Labs:masterfrom
Conversation
Adds ModelsLab (https://modelslab.com) as a TTS provider option in AnythingLLM. ModelsLab offers affordable AI APIs including text-to-speech at $0.0047 per generation with support for multiple English voice variants and languages. Changes: - server/utils/TextToSpeech/modelslab/index.js: New provider class with async polling support for ModelsLab's TTS API - server/utils/TextToSpeech/index.js: Register 'modelslab' provider case - server/utils/helpers/updateENV.js: Add env key mappings + validator - server/models/systemSettings.js: Expose ModelsLab settings to frontend - frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx: Settings UI - frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx: Add to provider list - frontend/src/media/ttsproviders/modelslab.png: Provider logo Env vars: - TTS_MODELSLAB_API_KEY (required) - TTS_MODELSLAB_VOICE_ID (optional, default: en_us_001) - TTS_MODELSLAB_LANGUAGE (optional, default: english) Closes #(issue)
timothycarambat
left a comment
There was a problem hiding this comment.
I cannot comment on an image, but the icon for the provider may render poorly being so small. Every provider image should be homogenous in size and bg.
- full white BG
- 330x330
png/jpg doesnt matter so much.
| this.apiKey = process.env.TTS_MODELSLAB_API_KEY; | ||
| this.voice = process.env.TTS_MODELSLAB_VOICE_ID ?? ModelsLabTTS.DEFAULT_VOICE; | ||
| this.language = process.env.TTS_MODELSLAB_LANGUAGE ?? "english"; | ||
| this.speed = parseFloat(process.env.TTS_MODELSLAB_SPEED ?? "1"); |
There was a problem hiding this comment.
TTS_MODELSLAB_SPEED is a property here but is not modifiable by the user via the UI or ENV
| async #pollForResult(requestId, maxAttempts = 20) { | ||
| const fetchUrl = "https://modelslab.com/api/v6/voice/fetch"; | ||
| for (let attempt = 0; attempt < maxAttempts; attempt++) { | ||
| await new Promise((r) => setTimeout(r, 3000)); | ||
| const response = await fetch(fetchUrl, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }), | ||
| }); | ||
| const data = await response.json(); | ||
| if (data.status === "success" && data.output?.length > 0) { | ||
| return await this.#fetchUrl(data.output[0]); | ||
| } | ||
| if (data.status === "error") { | ||
| this.#log("Poll error:", data.message || data.messege || "Unknown error"); | ||
| return null; | ||
| } | ||
| this.#log(`Polling attempt ${attempt + 1}/${maxAttempts}...`); | ||
| } | ||
| this.#log("Timed out waiting for audio generation."); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
There is no async/await for the HTTP request - you have to poll? This seems like a large error surface since a provider failure to process the job can lead to retrying until it dies to timeouts. Are there any docs around this endpoint?
3s flat is an approach, but an exp backoff might make more sense here? I am not sure what the performance is like for this provider to return audio
| const response = await fetch(fetchUrl, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }), |
There was a problem hiding this comment.
The API key is sent as a body param and not an Authorization Header?
| @@ -0,0 +1,123 @@ | |||
| const https = require("https"); | |||
| const http = require("http"); | |||
| const { URL } = require("url"); | |||
There was a problem hiding this comment.
URL should be available globally, no need to import
| #fetchUrl(url) { | ||
| return new Promise((resolve, reject) => { | ||
| const parsedUrl = new URL(url); | ||
| const transport = parsedUrl.protocol === "https:" ? https : http; | ||
| transport.get(url, (res) => { | ||
| const chunks = []; | ||
| res.on("data", (chunk) => chunks.push(chunk)); | ||
| res.on("end", () => resolve(Buffer.concat(chunks))); | ||
| res.on("error", reject); | ||
| }).on("error", reject); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Is this necessary? Could just return an ArrayBuffer in this case I think
/**
* Fetches a URL and returns the response body as a Buffer.
* @param {string} url
* @returns {Promise<Buffer>}
*/
async #fetchUrl(url) {
const response = await fetch(url);
if (!response.ok) throw new Error(`Failed to fetch audio: ${response.statusText}`);
const arrayBuffer = await response.arrayBuffer();
return Buffer.from(arrayBuffer);
}| const MODELSLAB_VOICES = [ | ||
| { value: "en_us_001", label: "English (US) - Voice 1" }, | ||
| { value: "en_us_006", label: "English (US) - Voice 2" }, | ||
| { value: "en_us_007", label: "English (US) - Voice 3" }, | ||
| { value: "en_us_009", label: "English (US) - Voice 4" }, | ||
| { value: "en_us_010", label: "English (US) - Voice 5" }, | ||
| { value: "en_uk_001", label: "English (UK) - Voice 1" }, | ||
| { value: "en_uk_003", label: "English (UK) - Voice 2" }, | ||
| { value: "en_au_001", label: "English (AU) - Voice 1" }, | ||
| { value: "en_au_002", label: "English (AU) - Voice 2" }, | ||
| ]; | ||
|
|
||
| const MODELSLAB_LANGUAGES = [ | ||
| { value: "english", label: "English" }, | ||
| { value: "spanish", label: "Spanish" }, | ||
| { value: "french", label: "French" }, | ||
| { value: "german", label: "German" }, | ||
| { value: "italian", label: "Italian" }, | ||
| { value: "portuguese", label: "Portuguese" }, | ||
| { value: "polish", label: "Polish" }, | ||
| { value: "hindi", label: "Hindi" }, | ||
| ]; |
There was a problem hiding this comment.
Is it possible to have language=French but voice be English (UK) - Voice2? I am not sure if that kind of combination is possible.
Additionally, do we have any insight into how often voices are updated or added? This list will not be actively maintained by the team so it can be out of date quickly.
If there is a way to pull from a GET /voice/models or something and render the dynamic list to the user would be best so its always current.
|
Thanks for the review @timothycarambat! Let me address your comments:
Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first? |
… removed unused imports
|
I have pushed fixes addressing your review comments:
The PR has been updated with these changes. Let me know if there are any other adjustments needed! |
- Add speed dropdown (0.5x-2x) to ModelsLabOptions component - Map TTSModelsLabSpeed in systemSettings.js - Add TTSModelsLabSpeed to updateENV.js for env var persistence Addresses review comments from @timothycarambat
|
Updated the PR with the speed option! Added:
The exponential backoff and simplified fetch were already in the original PR. Ready for another look! 🎉 |
|
Hi @timothycarambat! Thanks for the review. Let me address each point:
Should I push these updates now, or would you prefer I wait? |
|
Great questions!
Would you like me to update the PR with any specific changes based on this, or should I wait for the icon size fix confirmation? |
|
Great questions!
Would you like me to update the PR with any specific changes based on this, or should I wait for the icon size fix confirmation? |
|
Thanks for the detailed review! Addressing each point:
Let me know if adjustments needed! |
|
Thanks for the review @timothycarambat! Addressing all points:
Will also update icon to 330x330 white BG per your feedback. Pushing fixes shortly. |
adhikjoshi
left a comment
There was a problem hiding this comment.
Hi! Checking in at day 21 since last review. I've addressed all previous comments (exponential backoff, ArrayBuffer simplification, URL import fix, speed ENV var). Is there anything else needed to move this forward?
|
Hi! It's been about 2 weeks since my last update. I've addressed all review comments (exponential backoff, ArrayBuffer, URL import, speed ENV). Is there anything else needed from my side? |
|
timothycarambat Thanks for the review!
Let me know which changes you would like prioritized! |
|
timothycarambat Good question! Yes, language and voice are separate parameters in the ModelsLab API, so you can mix them. For example: The language affects pronunciation/speech patterns while voice_id determines the voice character. They don't have to match. However, some combinations may sound odd in practice (e.g., French language with English voice will have English accent). Let me know if you want me to add this combination to the UI options! |
|
timothycarambat Yes, it should be possible to mix language and voice. The language parameter affects speech synthesis characteristics while the voice ID determines the voice model. You can try: This would synthesize French text with the English US voice. The ModelsLab API should handle this combination. Let me know if you encounter any issues! |
Description
Adds ModelsLab as a text-to-speech provider in AnythingLLM's TTS settings.
Closes #5164
What is ModelsLab?
ModelsLab is an AI API platform offering affordable text-to-speech, image generation, video, and more. Their TTS API costs $0.0047 per generation with multi-language support and multiple voice presets.
API docs: https://docs.modelslab.com/text-to-speech/overview
What this PR adds
A new
modelslaboption in the Text-to-Speech Preferences settings panel, following the same pattern as the existing ElevenLabs and OpenAI-compatible providers.Files changed
server/utils/TextToSpeech/modelslab/index.jsttsBuffer()+ async pollingserver/utils/TextToSpeech/index.jsmodelslabcaseserver/utils/helpers/updateENV.jsserver/models/systemSettings.jsfrontend/src/components/TextToSpeech/ModelsLabOptions/index.jsxfrontend/src/pages/GeneralSettings/AudioPreference/tts.jsxfrontend/src/media/ttsproviders/modelslab.pngNew environment variables
How to test
Checklist
ttsBuffer()interface)status: success) and async (status: processing) API responsesTTS_MODELSLAB_*)supportedTTSProvidervalidatornullreturn on failure