Meta — 真 Open AI

PHIL Alive To AIGC #5

PHIL

Published in

十百千實驗室

3 min readNov 19, 2023

Segment Anything Model (SAM)

April 5, 2023

Introducing Segment Anything: Working toward the first foundation model for image segmentation

We're releasing the Segment Anything Model (SAM) - a step toward the first foundation model for image segmentation …

ai.meta.com

the first foundation model for image segmentation

DINO(self-DIstillation with NO labels)v2

April 17, 2023

DINOv2: State-of-the-art computer vision models with self-supervised learning

Today, we are open-sourcing DINOv2, the first method for training computer vision models that uses self-supervised…

ai.meta.com

the first method for training computer vision models that uses self-supervised learning to achieve results that match or surpass the standard approach used in the field

ImageBind: Holistic AI learning across six modalities

May 9, 2023

ImageBind: Holistic AI learning across six modalities

ImageBind is the first AI model capable of binding information from six modalities.

ai.meta.com

the first AI model capable of binding information from six modalities

Massively Multilingual Speech (MMS)

May 22, 2023

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

We expanded speech technology from about 100 languages to over 1,000 by building a single multilingual speech…

ai.meta.com

In the Massively Multilingual Speech (MMS) project, we overcome some of these challenges by combining wav2vec 2.0, our pioneering work in self-supervised learning, and a new dataset that provides labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages.

Voicebox: In-context text-to-speech synthesis

June 16, 2023

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with…

Voicebox is a state-of-the-art speech generative model based on a new method proposed by Meta AI called Flow Matching…

ai.meta.com

The first generative AI model for speech to generalize across tasks

CM3leon (pronounced like “chameleon”)

July 14, 2023

Introducing CM3leon, a more efficient, state-of-the-art generative model for text and images

Today, we're showcasing CM3leon (pronounced like "chameleon"), a single foundation model that does both text-to-image…

ai.meta.com

a single foundation model that does both text-to-image and image-to-text generation

Llama(Large Language Model Meta AI) 2

July 18, 2023

Meta and Microsoft Introduce the Next Generation of Llama

We're introducing the availability of Llama 2, the next generation of our open source large language model.

ai.meta.com

free for research and commercial use

AudioCraft: Generative AI for audio made simple

August 2, 2023

AudioCraft: A simple one-stop shop for audio modeling

AudioCraft is a simple framework that generates high-quality, realistic audio and music from text-based user inputs…

ai.meta.com

AudioCraft consists of three models: MusicGen, AudioGen, and EnCodec

SeamlessM4T: a foundational multimodal model for speech translation

August 22, 2023

Bringing the world closer together with a foundational multimodal model for speech translation

SeamlessM4T provides high-quality translation, allowing people from different linguistic communities to communicate…

ai.meta.com

SeamlessM4T, a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text

Code Llama: large language model for coding

August 24, 2023

Introducing Code Llama, a state-of-the-art large language model for coding

Code Llama, which is built on top of Llama 2, is free for research and commercial use.

ai.meta.com

Code Llama outperformed state-of-the-art publicly available LLMs on code tasks

Stable Signature: watermarking images created by open source generative AI

October 6, 2023

Stable Signature: A new method for watermarking images created by open source generative AI

Invisible watermarking incorporates information into digital content. The watermark is invisible to the naked eye but…

ai.meta.com

Stable Signature closes the potential for removing the watermark by rooting it in the model with a watermark that can trace back to where the image was created.

以上總結（使用 llama-2-70b-chat 生成）

Segment Anything Model (SAM) — 第一個影像分割基礎模型
DINO(自我訓練無標籤)v2 — 第一個使用自我訓練學習的計算機視覺模型
ImageBind — 第一個可以將信息從六個模式綁定的 AI 模型
Massively Multilingual Speech (MMS) — 從 100 種語言到 1,000 種語言的語音技術擴展項目
Voicebox — 第一個可以跨越任務的語音生成模型
CM3leon — 一個可以進行文字到圖像和圖像到文字生成的基礎模型
Llama 2 — 下一代開源大型語言模型，免費供研究和商業使用
AudioCraft — 一個簡單的框架，可以從文字 inputs 生成高質量、實際的音頻和音樂
SeamlessM4T — 一個多模式基礎模型，提供高質量的語音翻譯，讓不同語言的人們能夠互相溝通
Code Llama — 一個基於 Llama 2 的大型語言模型，免費供研究和商業使用，並且在編程任務上表現出色
Stable Signature — 一個新的方法，可以將水印加入由開源生成的圖像中，並且可以追溯到圖像的原始來源

Meta — 真 Open AI

PHIL Alive To AIGC #5

Segment Anything Model (SAM)

Introducing Segment Anything: Working toward the first foundation model for image segmentation

We're releasing the Segment Anything Model (SAM) - a step toward the first foundation model for image segmentation …

DINO(self-DIstillation with NO labels)v2

DINOv2: State-of-the-art computer vision models with self-supervised learning

Today, we are open-sourcing DINOv2, the first method for training computer vision models that uses self-supervised…

ImageBind: Holistic AI learning across six modalities

ImageBind: Holistic AI learning across six modalities

ImageBind is the first AI model capable of binding information from six modalities.

Massively Multilingual Speech (MMS)

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

We expanded speech technology from about 100 languages to over 1,000 by building a single multilingual speech…

Voicebox: In-context text-to-speech synthesis

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with…

Voicebox is a state-of-the-art speech generative model based on a new method proposed by Meta AI called Flow Matching…

CM3leon (pronounced like “chameleon”)

Introducing CM3leon, a more efficient, state-of-the-art generative model for text and images

Today, we're showcasing CM3leon (pronounced like "chameleon"), a single foundation model that does both text-to-image…

Llama(Large Language Model Meta AI) 2

Meta and Microsoft Introduce the Next Generation of Llama

We're introducing the availability of Llama 2, the next generation of our open source large language model.

AudioCraft: Generative AI for audio made simple

AudioCraft: A simple one-stop shop for audio modeling

AudioCraft is a simple framework that generates high-quality, realistic audio and music from text-based user inputs…

SeamlessM4T: a foundational multimodal model for speech translation

Bringing the world closer together with a foundational multimodal model for speech translation

SeamlessM4T provides high-quality translation, allowing people from different linguistic communities to communicate…

Code Llama: large language model for coding

Introducing Code Llama, a state-of-the-art large language model for coding

Code Llama, which is built on top of Llama 2, is free for research and commercial use.

Stable Signature: watermarking images created by open source generative AI

Stable Signature: A new method for watermarking images created by open source generative AI

Invisible watermarking incorporates information into digital content. The watermark is invisible to the naked eye but…

以上總結（使用 llama-2-70b-chat 生成）

Written by PHIL