Hi Linz,
Following up on our recent chat, I'm opening this feature request as discussed!
Currently, when generating voiceovers or dubbing from SRT files, having access to ultra-realistic, human-sounding voices is crucial for creating engaging social media video content. Standard TTS engines sometimes lack the natural emotion and pacing needed for high-quality short-form clips.
Describe the solution
I would love to see an integration with the ElevenLabs API to convert SRT text into audio directly within the tool.
Ideally, the workflow would include:
- A settings field in the UI to input an ElevenLabs API Key.
- An option to select or input a specific ElevenLabs Voice ID.
- The ability for
openclip to parse the SRT timestamps, send the text to ElevenLabs, and generate an aligned audio track for the video.
Describe alternatives you've considered
Currently, the workaround is generating the ElevenLabs audio manually via their website and then importing/syncing it back into the video editor. However, this is time-consuming and breaks the seamless automation pipeline that openclip provides.
Additional context
Since you mentioned receiving a few similar requests recently, integrating a premium TTS option like ElevenLabs would be a massive upgrade for content creators looking to automate their video production workflow.
Thanks again for your time and for maintaining this awesome project! Let me know if you need me to test anything once you start looking into it.
Hi Linz,
Following up on our recent chat, I'm opening this feature request as discussed!
Currently, when generating voiceovers or dubbing from SRT files, having access to ultra-realistic, human-sounding voices is crucial for creating engaging social media video content. Standard TTS engines sometimes lack the natural emotion and pacing needed for high-quality short-form clips.
Describe the solution
I would love to see an integration with the ElevenLabs API to convert SRT text into audio directly within the tool.
Ideally, the workflow would include:
openclipto parse the SRT timestamps, send the text to ElevenLabs, and generate an aligned audio track for the video.Describe alternatives you've considered
Currently, the workaround is generating the ElevenLabs audio manually via their website and then importing/syncing it back into the video editor. However, this is time-consuming and breaks the seamless automation pipeline that
openclipprovides.Additional context
Since you mentioned receiving a few similar requests recently, integrating a premium TTS option like ElevenLabs would be a massive upgrade for content creators looking to automate their video production workflow.
Thanks again for your time and for maintaining this awesome project! Let me know if you need me to test anything once you start looking into it.