Like Dall-E, this AI generates sound from text!

news material Like Dall-E, this AI generates sound from text!

Artificial intelligence never ceases to amaze us. While we mainly know it for generating images, researchers have managed to find a way to use it to generate sounds and it works quite well.

After the image generators, here are the sound generators

Image-to-text generators are now in the era of the times, especially since OpenAI announced last week that DALL-E was going to be available online for everyone. Thanks to AIs, like Midjourney or Stable Diffusion, we can now let artificial intelligence do the work for us. Just see the results for “ a painting of a cute black cat in a cyberpunk style city », to understand that our brushes and our pencils can finally stay warm.

To be honest, it’s quite surprising to see the progress that has been made. Just a few years ago, the results we were getting today were downright ‘awful’. But in the end, it’s quite logical to see that the advances made in the field of artificial intelligence are multiplying. A few months ago, the first text-based video generators started to appear, and today, it’s the turn of AudioGen, an audio generator to come to light.

AudioGen: the DALL-E of sound

AudioGen is an artificial intelligence program that generates sounds from text descriptions – so far no wonder. Researchers from Meta and the Hebrew University of JerusalemProject leaders describe their tool as an autoregressive generative model used to interpret natural language queries and generate audio samples from scratch.

We present “AudioGen: Textually Guided Audio Generation”! AudioGen is an autoregressive transformer that synthesizes general audio from text (Text-to-Audio).

Let us now look at some samples of the AI ​​in action. As can be heard in the Tweet shared by researcher Felix Kreuk, the artificial intelligence program was able to generate sounds related to:

  • Someone who whistles as the wind blows
  • A man who speaks while the birds sing and the dogs bark
  • Sirens and a craft that approaches then passes

According to the researchers, this AI model bypasses some complex audio issues. In particular, it is able to distinguish between different types of sounds and separate them acoustically, it can also filter two people talking at the same time, while being able to simulate background noise such as reverberation.

We don’t know precisely which dataset was used, but project members say they trained the model “using ten sets of audio data and corresponding labels”. This reminds us that many AI models are trained with sets or subsets of data containing copyrighted creations. For the moment, the project is still developing out of sight, however the teams intend to make it accessible to the public fairly quickly. They will soon post the AudioGen code and other technical details on their GitHub profile.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *