Meta Releases Five New AI Models
Meta has unveiled a slew of new artificial intelligence advancements, showcasing five major AI models and research initiatives. This release encompasses innovations in multi-modal systems, next-generation language models, music generation, AI speech detection, and concerted efforts to improve diversity in AI systems. Each of these developments marks significant strides in making AI systems more versatile, sophisticated, and inclusive.
Advanced AI Capabilities and Applications
The new multi-modal systems are designed to process both text and images, diverging from the traditional unidimensional approach of most large language models. Such systems can intake any combination of text and images and output similar combinations, enabling a wider array of applications. This is a noteworthy leap in AI, as it embraces a more holistic form of understanding and generation.
Moreover, Meta’s next-generation language models are equipped with multi-token prediction abilities, which can predict multiple future words simultaneously. This overhaul speeds up language model training and enhances its efficiency, providing quicker and more accurate language processing capabilities.
Music and Speech Innovations
One of the standout models is JASCO, a text-to-music model capable of accepting various inputs like chords or beats, allowing users to exert more control over the generated musical outputs. Complementing this is the release of AudioCraft, which includes MusicGen and AudioGen. MusicGen is tailored specifically for music generation, generating music conditioned on textual or melodic features, while AudioGen focuses more broadly on generating audio from text inputs.
In a significant move to counter the increasing sophistication of AI-generated content, Meta introduced AudioSeal, an audio watermarking technique for detecting AI-generated speech. This technology can pinpoint AI-generated segments within extended audio snippets at a remarkable speed, performing up to 485 times faster than previous methods. This breakthrough is pivotal in maintaining the authenticity and integrity of audio content.
Meta has also made strides in improving diversity within AI systems. The company has developed automatic indicators to evaluate geographical disparities in text-to-image models and conducted extensive annotation studies to enhance representation and inclusivity in AI-generated images. This effort aims to address and mitigate biases inherent in AI systems, fostering a more equitable technological ecosystem.
Lastly, Meta’s Chameleon model family presents a combination of image and text handling capabilities, making it possible to generate creative captions or new scenes using textual and visual inputs. Additionally, the introduction of the EnCodec audio codec demonstrates Meta’s pursuit of state-of-the-art real-time audio processing, providing foundational support in building MusicGen and AudioGen.
Meta is committed to fostering collaboration and driving innovation within the AI community by publicly sharing these models and research under various licenses. This openness not only aims to advance AI responsibly but also encourages collaborative efforts to push the boundaries of AI capabilities and applications.