Meta Voicebox is ready, but not available to use

Emre Çitak
Jun 20, 2023
Updated • Jun 20, 2023

Meta, the social media giant, has introduced Voicebox, an advanced AI speech tool with remarkable capabilities as explained by their blog post.

However, due to the potential risks associated with deepfake technology, Meta has decided not to release Voicebox to the public at this time. This decision emphasizes the company's commitment to responsible AI development.

See Meta AI's announcement about Meta Voicebox on Twitter below.

What is Meta Voicebox?

Meta Voicebox is an advanced AI speech tool developed by Meta, right after the MusicGen AI. It is designed to generate spoken dialogue using artificial intelligence algorithms.

Voicebox stands out from previous voice generator platforms due to its ability to perform speech generation tasks without requiring specific training for each task.

Unlike traditional methods that rely on carefully prepared training data, Voicebox adopts a novel approach. It learns directly from raw audio and accompanying transcriptions, enabling it to create speech that closely resembles natural conversation.

Meta Voicebox
Meta Voicebox has been trained on a large-scale dataset, enabling it to generate realistic speech in multiple languages

The primary objective of Meta Voicebox is to enhance communication across various domains. It has the potential to improve conversations in different languages using technological tools.

Additionally, Voicebox can contribute to the creation of more natural-sounding dialogue for video game characters, enriching the gaming experience.

How does Meta Voicebox work?

Voicebox can do some really cool things. It can remove background noise from speech recordings, so if you get interrupted by a doorbell or a barking dog, you don't have to re-record your speech.

It can also help fix mistakes in spoken words without having to record everything again.

One interesting feature of Voicebox is its ability to synthesize speech in different styles. It can take a reference audio clip in the desired style and generate speech that sounds coherent with that style.

Meta Voicebox
Meta Voicebox has the ability to synthesize speech in various styles, allowing for coherent and consistent speech production - Image: Meta

For example, you can have English speech that sounds like it was spoken by a French person. This opens up possibilities for people to speak different languages using their own voices.

Voicebox is also capable of creating diverse and expressive speech samples. It can generate unique audio styles without relying on specific audio inputs. This means it can produce a wide range of voices and sounds.

Deepfakes are the first bump that comes to mind

Voicebox also presents certain risks associated with deepfake technology. Deepfakes refer to manipulated media content, such as videos or audio, created using AI algorithms. They can make it seem like someone said or did something they never actually did.

Voicebox, with its powerful speech generation capabilities, could potentially be misused to create deepfake dialogue, leading to the spread of disinformation or manipulation.

In response to these concerns, Meta has decided not to release Voicebox to the public at this time. The company acknowledges the potential risks of misuse and aims to prioritize responsible AI development.

Meta Voicebox
Meta has prioritized responsible AI development and has decided not to release Meta Voicebox to the public at this time due to deep fake concerns

Meta has developed classifiers to distinguish between Voicebox-generated speech and human speech, striving to maintain accountability and transparency.

While Meta is committed to sharing research and collaborating with the AI community, the decision to withhold Meta Voicebox's public release underscores the need to strike a balance between openness and responsibility.

By sharing audio samples and a research paper, Meta aims to foster understanding and facilitate further research in the field, promoting the responsible use of AI technology.


Tutorials & Tips

Previous Post: «
Next Post: «


  1. Anonymous said on June 26, 2023 at 1:05 am

    “The company acknowledges the potential risks of misuse”

    Such things have had misuses by the state repression forces for at least 18 years in USA. Probably more.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.