Last Updated on
ElevenLabs, co-founded in 2022 by an ex-Google machine learning engineer and ex-Palantir deployment strategist, is the next big thing in audio generative AI. But winners in the AI arms race don’t stay ahead for long. Is ElevenLabs the best voice clone AI, or should we be synthesising our audiobooks elsewhere?
Is ElevenLabs the best voice clone AI?
Having tried more than a dozen of the most popular options out there in my work as a video producer earlier this year, I’ll admit I came into this with an opinion. Eleven Labs is the most established of the voice generators. That said, they’re all relatively new services. While the practice of recreating the human voice dates back to the 1700s, and was furthered significantly by Bell Labs in the 1900’s, the ability to do so in any usable form is a burgeoning industry. The user-friendliness of todays instant voice cloning is extremely impressive considering the obtuseness of the science behind it.
Essential AI Tools
Jasper AI
Best Deals
Copy.ai
Best Deals
Winston AI detector
Best Deals
Originality AI detector
Best Deals
WordAI
Best Deals
Try “What is ChatGPT – and what is it used for?” or “How to use ChatGPT on mobile” for further reading on ChatGPT.
Of these services, it is Eleven Labs that continues to be referenced in the comments of viral spoof videos. In April 2023, the trend of cloning music artist’s voices to produce new original music exploded – with Drake at the forefront. Misuse of these tools is not limited to one service, however. We’re here to discuss the quality of said service. On that front, Eleven Labs is still king.
Offering 3 levels of voice replication, they’re still pushing beyond the competition:
- Voice Design: With just some minor tweaking, you can produce a synthetic voice unique to you right now. Zero training data required on your part, take advantage of ElevenLabs own gender, age, and accent options.
- Voice Cloning: Create a realistic voice clone with less than even 1 minute of training data. Tested by yours truly, both on my own voice and that of several celebrities, it does an excellent job. Results vary significantly based on quality of training data, however – which is where option 3 comes in.
- Professional Voice Cloning: ElevenLabs offers “professional-grade cloning of your own voice that is indistinguishable from the real thing.” A highly bespoke service, this level will require more than 30 minutes of training data when it launches this Q3 2023.
How do I clone my voice with AI?
Voice cloning can be a technical process, but thankfully most of these services take the science out of the equation for you. A synthetic speech software like ElevenLabs AI uses a deep-learning model trained on text-to-speech data. They use this to convert a high-quality sample recording of your voice into a custom model that can now reproduce any word or phrase, including ones that you never spoke in the sample recordings. You can then use these new synthetic voices, complete with realistic human intonation, accent, and inflections, to produce a top-quality voice over with only a text prompt.
The audio files you submit need to be of high clarity and polish to help the AI map your speech correctly to the words it thinks you’re saying. Imperfections at this stage will result in imperfect outputs.
Armed with this custom model, content creators can use the platform to turn themselves into a podcaster. This is not a difficult task with traditional methods, but with generative AI this process becomes faster, multilingual, and possible from the comfort of a laptop on a beach somewhere. With just a text input, voice actors can generate realistic voiceovers of natural-sounding speech for audiobooks even faster than real time.
Is AI voice cloning free?
Generally no. Voice cloning is expensive for the provider, bespoke for the user, and frankly an incredible feat of software engineering. Speech synthesis is often offered for free in a limited capacity as part of a free trial to such services. This is possible because a set of AI voices, once created by the provider, can be used to sell an infinite number of users on that service, becoming quite financially economical as your marketing scales up.
Voice cloning on the other hand, being a much more computationally involved process, is not scalable and requires custom model training for each new user wanting their own voice that only they will benefit from.