Meta AI continues to release impressive new tools in the wake of Meta Connect, this September. Its latest offering is a generative AI audio tool that allows real-time voice translation from one language to another, while maintaining the vocal style, tone, and expressiveness of the original audio. SeamlessStreaming, the open-source code that makes this possible, is already available on GitHub. So how does SeamlessExpressive work, and how could you use it yourself?
What is SeamlessExpressive from Meta AI?
SeamlessExpressive is an AI tool from Meta that allows you to maintain the vocal style, tone, and expressiveness of your message. Intonation is an important aspect of vocal communication in most, if not all, languages. This aspect has been hereto missing from translation tools, and not due to complacency or lack of priority — it’s just very tricky to do.
Today's best pre Black Friday deal!
If you're in the market for a new gaming PC this November, then this high-end ZOTAC Gaming MEK Hero PC might be exactly what youre looking for - equipped with AMD's Ryzen 7 7700X and an RTX 4080 Super.
Prices correct as of November 13th, 2024.
The task of maintaining the inflection in your voice comes at the end of a very long series of other technological challenges, each state-of-the-art in their own right.
First, you have to master text-to-text translation, between “almost 100 languages” as Meta has done, despite the fact that not all languages share individual words with identical meanings and social implications. Then, to do so from text-to-speech, or from speech-to-text, requires voice recognition technology and a means to translate, not only between languages, but between modalities. In fact, SeamlessExpressive itself builds on a predecessor called SeamlessM4T, which handles everything but expressiveness.
In short, it’s very impressive and directly competes with similar technology from ElevenLabs.
Essential AI Tools
How does real-time AI speech translation work?
SeamlessExpressive can translate from text-to-text, speech-to-text, text-to-speech, and speech-to-speech. It will do so while maintaining the emotional tone of the original input, and with less than two seconds of latency, making it usably real-time translation and or speech output, regardless of modality.
The data set that underpins this technology is also being released under an open-source license.
In keeping with our approach to open science, we’re publicly releasing SeamlessM4T under a research license to allow researchers and developers to build on this work. We’re also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.
Meta AI
SeamlessExpressive output languages
The Meta AI real-time voice translator is capable of audio outputs in at least the following languages
- English: English
- Spanish: Español
- Italian: Italiano
- German: Deutsch
- French: Français
- Japanese: 日本語 (Nihongo)
- Javanese: ꦧꦱꦗꦮ (Basa Jawa)
- Croatian: Hrvatski
- Hungarian: Magyar
- Kamba: Kikamba
- Vietnamese: Tiếng Việt
- Luxembourgish: Lëtzebuergesch
- Ganda: Luganda
- Icelandic: Íslenska
- Luo: Dholuo
- Maltese: Malti
- Welsh: Cymraeg
- Finnish: Suomi