Meta, previously known as Facebook, has launched "CM3leon," an advanced artificial intelligence (AI) model capable of generating both images from text and text from images. CM3leon stands out as the first multimodal model, trained using a unique combination of techniques derived from text-only language models. This includes extensive retrieval-augmented pre-training and multitask supervised fine-tuning stages.
Meta unveils CM3leon, a cutting-edge text-to-image generation model that surpasses Google's offering in performance and capabilities.
As per Meta's claims, CM3leon's image generation tools have the ability to produce more coherent images that closely align with input prompts. Moreover, CM3leon achieves this with only five times the computing power and a smaller training dataset compared to previous transformer-based methods.During evaluation using the widely used image generation benchmark (zero-shot MS-COCO), CM3leon achieved an impressive FID (Frechet Inception Distance) score of 4.88, setting a new benchmark in text-to-image generation. This score surpasses Google's text-to-image model, solidifying CM3leon's position as the new state-of-the-art in this field.
Meta's CM3leon is a powerful multimodal model that excels in vision-language tasks and represents significant progress in the field of image generation.
Additionally, Meta emphasizes that CM3leon demonstrates outstanding performance in various vision-language tasks, such as visual question answering and long-form captioning. Despite being trained on a relatively modest dataset of just three billion text tokens, CM3leon's zero-shot performance compares favorably to larger models that were trained on more extensive datasets. This highlights the model's efficiency and capability in handling vision-language tasks.
According to Meta, CM3leon's impressive performance across diverse tasks signifies a step towards achieving higher-fidelity image generation and improved understanding. The company envisions models like CM3leon as a means to enhance creativity and facilitate better applications in the metaverse. Meta expresses their enthusiasm to further explore the potential of multimodal language models and promises to release more advanced models in the future.