Feature Request

Hi Linz,

Following up on our recent chat, I'm opening this feature request as discussed!

Currently, when generating voiceovers or dubbing from SRT files, having access to ultra-realistic, human-sounding voices is crucial for creating engaging social media video content. Standard TTS engines sometimes lack the natural emotion and pacing needed for high-quality short-form clips. 

**Describe the solution**
I would love to see an integration with the **ElevenLabs API** to convert SRT text into audio directly within the tool. 
Ideally, the workflow would include:
1. A settings field in the UI to input an ElevenLabs API Key.
2. An option to select or input a specific ElevenLabs Voice ID.
3. The ability for `openclip` to parse the SRT timestamps, send the text to ElevenLabs, and generate an aligned audio track for the video.

**Describe alternatives you've considered**
Currently, the workaround is generating the ElevenLabs audio manually via their website and then importing/syncing it back into the video editor. However, this is time-consuming and breaks the seamless automation pipeline that `openclip` provides.

**Additional context**
Since you mentioned receiving a few similar requests recently, integrating a premium TTS option like ElevenLabs would be a massive upgrade for content creators looking to automate their video production workflow. 

Thanks again for your time and for maintaining this awesome project! Let me know if you need me to test anything once you start looking into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions