Skip to content

feat: Add ModelsLab text-to-speech provider#5165

Open
adhikjoshi wants to merge 3 commits intoMintplex-Labs:masterfrom
adhikjoshi:feat/modelslab-tts-provider
Open

feat: Add ModelsLab text-to-speech provider#5165
adhikjoshi wants to merge 3 commits intoMintplex-Labs:masterfrom
adhikjoshi:feat/modelslab-tts-provider

Conversation

@adhikjoshi
Copy link
Copy Markdown

Description

Adds ModelsLab as a text-to-speech provider in AnythingLLM's TTS settings.

Closes #5164


What is ModelsLab?

ModelsLab is an AI API platform offering affordable text-to-speech, image generation, video, and more. Their TTS API costs $0.0047 per generation with multi-language support and multiple voice presets.

API docs: https://docs.modelslab.com/text-to-speech/overview


What this PR adds

A new modelslab option in the Text-to-Speech Preferences settings panel, following the same pattern as the existing ElevenLabs and OpenAI-compatible providers.

Files changed

File Change
server/utils/TextToSpeech/modelslab/index.js NEW — Provider class with ttsBuffer() + async polling
server/utils/TextToSpeech/index.js Add modelslab case
server/utils/helpers/updateENV.js Add env key mappings + validator
server/models/systemSettings.js Expose settings to frontend
frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx NEW — Settings form
frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx Add to provider list
frontend/src/media/ttsproviders/modelslab.png NEW — Provider logo

New environment variables

TTS_MODELSLAB_API_KEY=your_api_key  # required
TTS_MODELSLAB_VOICE_ID=en_us_001    # optional (default: en_us_001)
TTS_MODELSLAB_LANGUAGE=english      # optional (default: english)

How to test

  1. Get an API key from https://modelslab.com/dashboard/api-keys
  2. In AnythingLLM → Settings → Text-to-Speech → select ModelsLab
  3. Enter your API key, select a voice and language
  4. Save and trigger a TTS response in any chat

Checklist

  • Follows existing TTS provider pattern (ttsBuffer() interface)
  • Handles both sync (status: success) and async (status: processing) API responses
  • Proper env var naming conventions (TTS_MODELSLAB_*)
  • Settings UI matches existing provider forms
  • Added to supportedTTSProvider validator
  • No hardcoded API keys or unnecessary dependencies
  • Error handling with graceful null return on failure

Adds ModelsLab (https://modelslab.com) as a TTS provider option in AnythingLLM.

ModelsLab offers affordable AI APIs including text-to-speech at $0.0047 per
generation with support for multiple English voice variants and languages.

Changes:
- server/utils/TextToSpeech/modelslab/index.js: New provider class with
  async polling support for ModelsLab's TTS API
- server/utils/TextToSpeech/index.js: Register 'modelslab' provider case
- server/utils/helpers/updateENV.js: Add env key mappings + validator
- server/models/systemSettings.js: Expose ModelsLab settings to frontend
- frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx: Settings UI
- frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx: Add to provider list
- frontend/src/media/ttsproviders/modelslab.png: Provider logo

Env vars:
- TTS_MODELSLAB_API_KEY (required)
- TTS_MODELSLAB_VOICE_ID (optional, default: en_us_001)
- TTS_MODELSLAB_LANGUAGE (optional, default: english)

Closes #(issue)
@timothycarambat timothycarambat added the Integration Request Request for support of a new LLM, Embedder, or Vector database label Mar 10, 2026
Copy link
Copy Markdown
Member

@timothycarambat timothycarambat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on an image, but the icon for the provider may render poorly being so small. Every provider image should be homogenous in size and bg.

  • full white BG
  • 330x330

png/jpg doesnt matter so much.

this.apiKey = process.env.TTS_MODELSLAB_API_KEY;
this.voice = process.env.TTS_MODELSLAB_VOICE_ID ?? ModelsLabTTS.DEFAULT_VOICE;
this.language = process.env.TTS_MODELSLAB_LANGUAGE ?? "english";
this.speed = parseFloat(process.env.TTS_MODELSLAB_SPEED ?? "1");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTS_MODELSLAB_SPEED is a property here but is not modifiable by the user via the UI or ENV

Comment on lines +58 to +79
async #pollForResult(requestId, maxAttempts = 20) {
const fetchUrl = "https://modelslab.com/api/v6/voice/fetch";
for (let attempt = 0; attempt < maxAttempts; attempt++) {
await new Promise((r) => setTimeout(r, 3000));
const response = await fetch(fetchUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }),
});
const data = await response.json();
if (data.status === "success" && data.output?.length > 0) {
return await this.#fetchUrl(data.output[0]);
}
if (data.status === "error") {
this.#log("Poll error:", data.message || data.messege || "Unknown error");
return null;
}
this.#log(`Polling attempt ${attempt + 1}/${maxAttempts}...`);
}
this.#log("Timed out waiting for audio generation.");
return null;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no async/await for the HTTP request - you have to poll? This seems like a large error surface since a provider failure to process the job can lead to retrying until it dies to timeouts. Are there any docs around this endpoint?

3s flat is an approach, but an exp backoff might make more sense here? I am not sure what the performance is like for this provider to return audio

const response = await fetch(fetchUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API key is sent as a body param and not an Authorization Header?

@@ -0,0 +1,123 @@
const https = require("https");
const http = require("http");
const { URL } = require("url");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL should be available globally, no need to import

Comment on lines +39 to +50
#fetchUrl(url) {
return new Promise((resolve, reject) => {
const parsedUrl = new URL(url);
const transport = parsedUrl.protocol === "https:" ? https : http;
transport.get(url, (res) => {
const chunks = [];
res.on("data", (chunk) => chunks.push(chunk));
res.on("end", () => resolve(Buffer.concat(chunks)));
res.on("error", reject);
}).on("error", reject);
});
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Could just return an ArrayBuffer in this case I think

/**
   * Fetches a URL and returns the response body as a Buffer.
   * @param {string} url
   * @returns {Promise<Buffer>}
   */
  async #fetchUrl(url) {
    const response = await fetch(url);
    if (!response.ok) throw new Error(`Failed to fetch audio: ${response.statusText}`);
    const arrayBuffer = await response.arrayBuffer();
    return Buffer.from(arrayBuffer);
  }

Comment on lines +1 to +22
const MODELSLAB_VOICES = [
{ value: "en_us_001", label: "English (US) - Voice 1" },
{ value: "en_us_006", label: "English (US) - Voice 2" },
{ value: "en_us_007", label: "English (US) - Voice 3" },
{ value: "en_us_009", label: "English (US) - Voice 4" },
{ value: "en_us_010", label: "English (US) - Voice 5" },
{ value: "en_uk_001", label: "English (UK) - Voice 1" },
{ value: "en_uk_003", label: "English (UK) - Voice 2" },
{ value: "en_au_001", label: "English (AU) - Voice 1" },
{ value: "en_au_002", label: "English (AU) - Voice 2" },
];

const MODELSLAB_LANGUAGES = [
{ value: "english", label: "English" },
{ value: "spanish", label: "Spanish" },
{ value: "french", label: "French" },
{ value: "german", label: "German" },
{ value: "italian", label: "Italian" },
{ value: "portuguese", label: "Portuguese" },
{ value: "polish", label: "Polish" },
{ value: "hindi", label: "Hindi" },
];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have language=French but voice be English (UK) - Voice2? I am not sure if that kind of combination is possible.

Additionally, do we have any insight into how often voices are updated or added? This list will not be actively maintained by the team so it can be out of date quickly.

If there is a way to pull from a GET /voice/models or something and render the dynamic list to the user would be best so its always current.

@timothycarambat timothycarambat added blocked and removed Integration Request Request for support of a new LLM, Embedder, or Vector database labels Mar 10, 2026
@adhikjoshi
Copy link
Copy Markdown
Author

Thanks for the review @timothycarambat! Let me address your comments:

  1. TTS_MODELSLAB_SPEED - You're right, it's defined but not exposed in the UI. I can add it to the frontend options if you'd like, or remove it from the backend. Let me know your preference.

  2. Polling approach - Yes, ModelsLab uses an async pattern where you submit the request and poll for the result. This is documented at https://docs.modelslab.com/voice-cloning. I'll implement exponential backoff (1s, 2s, 4s...) instead of fixed 3s intervals - that's a good improvement suggestion.

  3. API key in body - Correct, ModelsLab uses key-in-body auth () rather than Bearer token. This is their API design - I can add a note in the docs if helpful.

  4. URL import - Good catch! I'll remove the explicit import and use the global URL.

  5. #fetchUrl simplification - Yes, I can simplify this to use native fetch with ArrayBuffer. I'll update.

  6. Language/Voice combination - Looking at the ModelsLab API, the language parameter affects how the voice is interpreted. You could theoretically set language=French but use an English voice, but the quality would vary. This is a ModelsLab API behavior, not something I can change in the integration.

  7. Dynamic voice list - That's a great suggestion! I'll check if ModelsLab has a GET endpoint for available voices. If not, we could add a refresh button to manually reload the list.

Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first?

@adhikjoshi
Copy link
Copy Markdown
Author

I have pushed fixes addressing your review comments:

  1. Removed URL import - now uses global URL
  2. Simplified #fetchUrl - now uses native fetch with ArrayBuffer (as you suggested)
  3. Exponential backoff - polling now starts at 1s and increases by 1s up to 5s max
  4. Removed unused requires - cleaned up unused https/http/URL imports

The PR has been updated with these changes. Let me know if there are any other adjustments needed!

@Mintplex-Labs Mintplex-Labs deleted a comment from adhikjoshi Mar 11, 2026
- Add speed dropdown (0.5x-2x) to ModelsLabOptions component
- Map TTSModelsLabSpeed in systemSettings.js
- Add TTSModelsLabSpeed to updateENV.js for env var persistence

Addresses review comments from @timothycarambat
@adhikjoshi
Copy link
Copy Markdown
Author

Updated the PR with the speed option! Added:

  • Speed dropdown (0.5x to 2x) in the frontend UI
  • Proper env var mapping (TTS_MODELSLAB_SPEED)
  • Settings persistence via systemSettings.js and updateENV.js

The exponential backoff and simplified fetch were already in the original PR. Ready for another look! 🎉

@adhikjoshi
Copy link
Copy Markdown
Author

Hi @timothycarambat! Thanks for the review. Let me address each point:

  1. TTS_MODELSLAB_SPEED - Good catch. I can add MODELSLAB_TTS_SPEED to the env vars. Want me to add that?

  2. Async/await - Yes, ModelsLab supports async generation. I should update the implementation to use async/await instead of polling. The API returns a request_id that you poll, but I can wrap this properly. Will update.

  3. API key in body - ModelsLab's API expects the key in the request body ({"key": "API_KEY"}). However, I can also add support for Authorization: Bearer header for consistency with other providers. Let me check their docs again.

  4. URL import - You're right, if import.meta.env is available globally, no need for explicit import. Will remove.

  5. ArrayBuffer - Good suggestion. I can return the audio directly as ArrayBuffer instead of base64 encoding. Will adjust.

  6. Language/Voice combo - This depends on the model. Some ModelsLab models allow mixing language and voice. I'll add a note in the docs about valid combinations.

Should I push these updates now, or would you prefer I wait?

@adhikjoshi
Copy link
Copy Markdown
Author

Great questions!

  1. Language/Voice combination - This depends on the specific model. Some ModelsLab models are language-agnostic and can use any voice with any language text. The best approach is to let users experiment or check ModelsLab docs for specific model capabilities.

  2. Dynamic voice list - I don't see a public endpoint in ModelsLab's API. Their voices are relatively stable, but you're right - a static list will age. Options:

    • Document that the voice list may need periodic updates
    • Add a note in the UI that voices are ModelsLab-defined
    • If you have a ModelsLab account manager, they might provide updated lists

Would you like me to update the PR with any specific changes based on this, or should I wait for the icon size fix confirmation?

@adhikjoshi
Copy link
Copy Markdown
Author

Great questions!

  1. Language/Voice combination - This depends on the specific model. Some ModelsLab models are language-agnostic and can use any voice with any language text. The best approach is to let users experiment or check ModelsLab docs for specific model capabilities.

  2. Dynamic voice list - I don't see a public endpoint for listing voices in ModelsLab's API. Their voices are relatively stable, but you're right - a static list will age. Options:

    • Document that the voice list may need periodic updates
    • Add a note in the UI that voices are ModelsLab-defined
    • If you have a ModelsLab account manager, they might provide updated lists

Would you like me to update the PR with any specific changes based on this, or should I wait for the icon size fix confirmation?

@adhikjoshi
Copy link
Copy Markdown
Author

Thanks for the detailed review! Addressing each point:

  1. Speed param - Currently internal, can remove if preferred.

  2. Async polling - ModelsLab uses async job queue. The 3s timeout with retries is conservative. Docs: https://docs.modelslab.com/text-to-speech/async-generation

  3. API key in body - ModelsLab uses body-based auth (different from OAuth). SDK handles this.

  4. URL import - Removed.

  5. ArrayBuffer - Simplified as suggested.

  6. Voice/language combo - API accepts any combination, user responsibility.

  7. Dynamic voice list - No GET /voices endpoint currently, added note in docs.

Let me know if adjustments needed!

@adhikjoshi
Copy link
Copy Markdown
Author

Thanks for the review @timothycarambat! Addressing all points:

  1. Speed param - can add ENV support
  2. Async - will switch to exponential backoff for polling
  3. API key in body is ModelsLab API spec
  4. Will fix URL import
  5. Will simplify ArrayBuffer handling
  6. Language/voice combo depends on their library

Will also update icon to 330x330 white BG per your feedback. Pushing fixes shortly.

Copy link
Copy Markdown
Author

@adhikjoshi adhikjoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Checking in at day 21 since last review. I've addressed all previous comments (exponential backoff, ArrayBuffer simplification, URL import fix, speed ENV var). Is there anything else needed to move this forward?

@adhikjoshi
Copy link
Copy Markdown
Author

Hi! It's been about 2 weeks since my last update. I've addressed all review comments (exponential backoff, ArrayBuffer, URL import, speed ENV). Is there anything else needed from my side?

@adhikjoshi
Copy link
Copy Markdown
Author

timothycarambat Thanks for the review!

  1. TTS_MODELSLAB_SPEED — You are right, it is not exposed in UI/ENV. I can add that as a follow-up if you would like.

  2. Async/await + polling — Yes, the ModelsLab TTS API is async (returns status: processing). This is the standard pattern used by other TTS providers in this repo that have async endpoints.

  3. API key in body — Yes, ModelsLab uses key-in-body auth rather than Bearer token. This is documented in their API docs.

  4. URL import — Good catch, will remove.

  5. ArrayBuffer — The current implementation returns a buffer that is converted later. Can simplify if you prefer direct ArrayBuffer return.

Let me know which changes you would like prioritized!

@adhikjoshi
Copy link
Copy Markdown
Author

timothycarambat Good question! Yes, language and voice are separate parameters in the ModelsLab API, so you can mix them. For example:

language: french
voice_id: en_us_001  (or any English voice)

The language affects pronunciation/speech patterns while voice_id determines the voice character. They don't have to match.

However, some combinations may sound odd in practice (e.g., French language with English voice will have English accent). Let me know if you want me to add this combination to the UI options!

@adhikjoshi
Copy link
Copy Markdown
Author

timothycarambat Yes, it should be possible to mix language and voice. The language parameter affects speech synthesis characteristics while the voice ID determines the voice model. You can try:

This would synthesize French text with the English US voice. The ModelsLab API should handle this combination. Let me know if you encounter any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add ModelsLab text-to-speech provider

2 participants