ElevenLabs is one of the firms on the frontier of generative AI audio, so we thought it only right to put it through its paces with our review. Founded in 2022, it’s a relatively new venture, roughly as mature as ChatGPT. This AI-powered platform allows you to generate audio content using high-quality speech-to-text algorithms. What sets it apart, however, is the ability to clone your own voice and use that for the speech you generate.
Voice actors, podcasters, and social media presenters alike can benefit from the convenience of the platform. Clone your voice once, and you can work from your laptop – no studio required. Is it as good as everyone says? Let’s find out.
AMD Ryzen 7 9800X3D launches today!
AMD has finally launched the highly anticipated 9800X3D, vowing for gaming CPU supremacy for another year. Below are the latest listings we can find
Prices correct as of November 7th, 2024.
ElevenLabs review (free version) – updated for 2024
- Free plans allows use of all core functions
- Industry-leading quality of generative audio
- High-quality preset voice library (No need to clone yourself)
- High degree of expressiveness control
- Comprehensive tool set for voice production
- API access on free plan
- Generate speech in 29 languages on free plan
- Dub from 57 languages into 29 languages
- Limited to 3 custom voices on the free plan
- 128kbps MP3 output on free and Starter plan (Ideally 320kbps)
- 192kbps MP3 output limitation even on Creator Plan
How we tested ElevenLabs
I used the free version of the software to evaluate the core functions of the service. Speech synthesis, also known as text-to-speech, as well as voice cloning and dubbing (speech-to-speech) are all available on all plans. Premium plans allow users to produce more of the same content, but the training and synthesis algorithms are identical between free and paid plans. By running these tools and features through a series of tests, we hope to provide a useful understanding of what is possible across the service.
Dubbing Studio, the analytics dashboard, and professional voice cloning services are exclusive to premium plans, meaning that these tools and features aren’t included in this review just yet, however, we will be testing it out in the near future and updating this review accordingly.
Specific tiers allow you to export at higher qualities than 128kbps MP3, but this doesn’t affect the expressiveness or human-like qualities of the result, which are identical across free and paid plans (excluding professional voice cloning services, which are better).
Is ElevenLabs the best voice clone AI?
Having tried over a dozen of the most popular options in 2023 when producing videos for social media, I’ll admit I came into this with an opinion. Eleven Labs is the most established of the voice generators. That said, they’re all relatively new services. While the practice of recreating the human voice dates back to the 1700s and was furthered significantly by Bell Labs in the 1900s, consumer self-serve access is a burgeoning industry. The user-friendliness of today’s instant voice cloning is extremely impressive considering the obfuscated science behind it.
Check any comments section on a viral spoof video and you’re likely to find a reference to ElevenLabs. In April 2023, the trend of cloning music artists’ voices to produce new original music exploded – with Drake at the forefront. Misuse of these tools is not limited to one service, however. We’re here to discuss the quality of said service. On that front, Eleven Labs is still king.
- Voice Design: With just some minor tweaking, you can produce a synthetic voice unique to you right now. Zero training data is required on your part. Without the need for a microphone, you can produce a unique voice with the software’s own gender, age, and accent options.
- Voice Cloning: Create a realistic voice clone with less than 1 minute of training data. The more data you provide, the more accurate your clone will be. Tested by yours truly, both on my own voice and that of several celebrities, it does an excellent job. Results vary significantly based on the quality of training data, however – which is where option 3 comes in.
- Professional Voice Cloning: ElevenLabs offers “professional-grade cloning of your own voice that is indistinguishable from the real thing.” A highly bespoke service, this level will require more than 30 minutes of training data when it launches this Q3 2023.
How does ElevenLabs work?
ElevenLabs uses machine learning, but not quite in the same way as your favorite chatbots. Whereas LLM (large language models) use natural language processing (NLP) to understand the content of text, ElevenLabs is trained to convert text to audio, or audio to audio in the case of voice cloning. In other words, ElevenLabs won’t write your script for you, but it will turn your script into the spoken word, with audio files you can then download and use in your own projects.
Using ElevenLabs – my experience
I’ve used ElevenLabs for more than a year now, and during that time the AI tool has gone from strength to strength. The quality of the algorithms has increased, and multilingual support has made impressive strides, now able to generate life-like speech in 29 languages.
First impressions
My first impression of ElevenLabs was one of surprise, in the best possible way. AI voices were somewhat hit-or-miss, but when they hit, they were the best on the market. Surprisingly realistic, by comparison to the Microsoft-Sam-esque robo voices we’ve all grown up with. This was something of an ‘aha’ moment – that artificial intelligence would one day recreate human voices so well that they would be listenable in long-form content like movies, TV, and the narration of audiobooks. Still, at the time there was no AI voice cloning platform that could flawlessly create natural human speech.
Features and tools | First impression |
---|---|
Speech Synthesis | Inherently the hardest tool to use, due to the user-generated content (UGC) requirement of providing your own audio recordings. You’ll need a microphone, and there’s no way around this. Of course, you could use your smartphone’s built-in mic, but this won’t result in a good-quality clone. |
Voice Library | Excellent quality voices, and selection. Covers a wide variety of voice types, and could pass as a real human voice some of the time. |
Voice Design | Easy to use, but difficult to master. This isn’t so much a fault of the UI, which is simple, clean, and provides helpful explanations of how each parameter will affect the generated voice. Instead, there’s a little guesswork involved when iterating on subjective parameters like tonality and expressiveness. The ability to generate a unique voice without needing a microphone is a strong selling point, however.. |
Voice Cloning | Inherently the hardest tool to use, due to the user-generated content (UGC) requirement of providing your own audio recordings. You’ll need a microphone, and there’s no way around this. Of course, you could use your smartphone’s built-in mic, but this won’t result in a good quality clone. |
Dubbing | Impressive and effective, dubbing allows you to change the spoken language of an audio recording, replacing the voice entirely. Able to understand 57 languages, and produce 29, it’s an excellent example of AI voice dubbing. |
Quality of narration
ElevenLabs’ quality of narration is listenable at worst, and indistinguishable from a real human voice at best. I’ve listened to audiobooks with the same level of clarity, diction, and expressiveness as some of the voices available for free within the ElevenLabs voice library.
One of the hardest things for artificial intelligence voices to master has always been the finer details. Pauses between sentences, intonation where emotion should be implied, and breaths that mimic a realistic lung capacity. Aside from the technical quality of the audio algorithms, these elements have always made AI detectable in the field of speech-to-text.
When generating audio with a voice from the preset library, you’ll find the following stylization parameters:
- Stability
- Clarity + Similarity
- Style Exaggeration
We found these controls highly effective at controlling the diction and intonation of the voice. Different scripts will require different emotions and degrees of expressiveness. This is where you’ll fine-tune these vocal qualities, and while it takes a few tries to get each output just right – which will cost you generation credits each time you check it – this control is essential for any form of media with emotional depth.
Technical quality
As for technical quality, there’s more to it than the bitrate of the audio file. We’re looking for 320kbps MP3 output, which is not hard to achieve. Unfortunately, you’ll only get 128kbps when downloading files with the free plan. Fortunately, the quality difference between 320kbps and 128kbps is not something that all users will be able to hear. It’s still serviceable for most applications, especially social media, although audiobook platforms prefer higher-quality files.
The standard bitrate for a ‘high-quality’ MP3 file is 320kbps. However, the priority of MP3 compression is low file size. If you really want high quality, there are entire file types that cater to that priority, most commonly being WAV (or Waveform Audio File Format), a lossless audio format that allows bitrates of over 9,000kbps. Most commonly, you find them around 2,000kbps. These aren’t compressed as they’re saved to disk, whereas MP3s are. Still, MP3s offer the best tradeoff of quality to file size between the two, which is important for audio data streaming and data management at scale – which ElevenLabs needs to consider.
The Creator premium subscription increases this to 192kbps via the API.
We’d also be looking for a 48KHz sample rate, or at least 44.1KHz, which is the standard for CDs. Sure, CDs themselves are outdated, but the reasoning behind 44.1KHz is based on sound science, allowing for a perfect recreation of every frequency audible by the human ear.
The Independent Publisher subscription allows for 44.1KHz output via the API.
Lastly, we’d be looking for a minimum bit depth of 16, with 24 being sensible, and 32-bit floating point being impressive. This dictates how accurately an audio file can record amplitude, and is more important for audio files with a lot of contrast between the loudest and quietest parts. However, the quietest part is probably the noise floor, which makes it particularly important because you can remove background noise much more effectively from a 32-bit file than a 16-bit one.
These aspects aren’t recorded in the metadata of an MP3 file. As the free version of ElevenLabs doesn’t output WAV files, we can’t see these technical aspects. However, you’ll now know what to look for if the generative AI audio platform does enable WAV output.
Voice cloning
To use voice cloning with ElevenLabs, users are required to upload a sample of their own recorded speech. This is the data set that your own personalized AI model will be based on. It’s also a good place to explain the technical quality we’re looking for, outside of bitrate, bit depth, and sample rate. Each of these elements is important to an extent, but the quality of the algorithm is exemplified by how it handles your own audio samples.
For research purposes, I trained ElevenLabs on the public voice recordings of three famous figures. The resulting AI models sounded as similar to the original speakers as the names would imply – Elom Nusk, Dwake, and Mr Beats. In short, not identical.
There’s something distinctly non-human about them, but then that does rely on the user input as much as the platform. The quality of a voice clone is a joint effort between the quality of the algorithm, and that of the training data. That is to say, even the world’s best training algorithms won’t be able to do much with a poor-quality audio recording plagued by background noises, lossy compression, and the bassy rumble of a computer vibrating the desk on which your microphone was situated while recording your voice.
When generating audio with a voice clone, you’ll find the following stylization parameters:
- Stability
- Clarity + Similarity
- Style Exaggeration
These are the same as those you’ll find when using a voice from the preset library, and we appreciate the consistency of implementation here. You’ll find yourself getting familiar with what exactly these controls do once you’ve used them several times, and develop a skill for getting perfect emotion with the minimum possible number of iterations.
In conclusion, if you need something user-friendly that just works, go for the ElevenLabs voice library. Recording high-enough-quality audio samples for voice cloning purposes will require some knowledge of digital audio practices. In this way, it’s inherently not as user-friendly as picking a voice that was made for you.
How do I clone my voice with AI?
Voice cloning can be a technical process, but most of these services take the science out of the equation for you. Synthetic speech software like ElevenLabs AI uses a deep-learning model trained on text-to-speech data. They use this to convert a high-quality sample recording of your voice into a custom model that can now reproduce any word or phrase, including ones that you never spoke in the sample recordings. You can then use these new synthetic voices, complete with realistic human intonation, accent, and inflections, to produce a top-quality voice-over with only a text prompt.
The audio files you submit need to be of high clarity to help the AI map your speech correctly to the words it thinks you’re saying. Imperfections at this stage will result in imperfect outputs.
Pricing
ElevenLabs has six pricing tiers to choose from, which is more than most services of this kind. The free plan is free forever, not a free trial. There are no free trials for the premium tiers, but you might find a discounted price when subscribing for the first time.
- Free Plan
- Starter ($5/month)
- Creator ($22/month)
- Independent publisher ($99/month)
- Growing Business ($330/month)
- Enterprise (Scalable)
We’d consider the pricing plans, and what comes with them, fair for the money. With access to Speech Synthesis, Dubbing, Voice Cloning, and Voice Design for free, we were able to test all of the core functions of the platform. That’s a big plus in our book.
Final thoughts – is ElevenLabs worth it?
ElevenLabs is worth it for any level of content creator working with spoken audio. Hobbyist content creators and professional voice artists alike can benefit from at least one of the plans above.
While not all working environments will promote the use of AI to generate your content, we tested it in a fast-paced professional environment for producing content for TikTok. The learning curve is an easy hurdle unless you’re using the voice clone feature. If you have the benefit of working in a team, have your colleague with the most experience in audio create the dataset of voice recordings. The clarity of recording, file settings, and background noise will all have a substantial effect on your final result.
In conclusion, the AI audio platform is worth it for anyone producing audio content at scale and speed.