Say “Hello” to VALL-E

Microsoft have announce a new lifelike text-to-speech system that can generate high-quality personalized speech with just a 3-second recording of an unseen speaker.

VALL-E, is a new language modeling approach developed by Microsoft. Unlike previous methods that rely on continuous signal regression, VALL-E uses discrete codes derived from an off-the-shelf neural audio codec model to train its model. With a whopping 60K hours of English speech in its training data – hundreds of times larger than other systems. And the results are impressive: VALL-E outperforms the state-of-the-art zero-shot TTS system in terms of naturalness and speaker similarity, while preserving the emotion and acoustic environment of the acoustic prompt in synthesis. Keep an eye out for VALL-E and how it could revolutionize the TTS industry!

As with any technology, there are concerns that should be addressed when it comes to AI generated speech systems like VALL-E. Here are a few potential concerns:

  1. Bias: Since AI models like VALL-E are trained on large datasets, there is a risk that they will inherit biases present in the data. For example, if the training data is mostly from one demographic group, the AI-generated speech may not be as accurate or appropriate for other groups.

  2. Misuse: AI-generated speech could be misused for harmful purposes such as spreading misinformation or impersonating individuals.

  3. Privacy: AI-generated speech systems require access to large amounts of speech data to function properly, which raises concerns about privacy and data protection.

  4. Transparency: It can be difficult to understand how AI-generated speech systems like VALL-E make decisions, which can make it challenging to hold them accountable for their actions.

It's important for developers of AI-generated speech systems to consider these concerns and take steps to address them. This can include developing strategies to mitigate bias, implementing safeguards to prevent misuse, and ensuring transparency and accountability in the development and deployment of these systems.

You can learn more about VALL-E over on Github: https://valle-demo.github.io

Richard Cawood

Richard is an award winning portrait photographer, creative media professional and educator currently based in Dubai, UAE.

http://www.2ndLightPhotography.com
Previous
Previous

Upcoming Creative Research Talk

Next
Next

Your Copilot for the Web