Segment Anything Model (SAM)
April 5, 2023
the first foundation model for image segmentation
DINO(self-DIstillation with NO labels)v2
April 17, 2023
the first method for training computer vision models that uses self-supervised learning to achieve results that match or surpass the standard approach used in the field
ImageBind: Holistic AI learning across six modalities
May 9, 2023
the first AI model capable of binding information from six modalities
Massively Multilingual Speech (MMS)
May 22, 2023
In the Massively Multilingual Speech (MMS) project, we overcome some of these challenges by combining wav2vec 2.0, our pioneering work in self-supervised learning, and a new dataset that provides labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages.
Voicebox: In-context text-to-speech synthesis
June 16, 2023
The first generative AI model for speech to generalize across tasks
CM3leon (pronounced like “chameleon”)
July 14, 2023
a single foundation model that does both text-to-image and image-to-text generation
Llama(Large Language Model Meta AI) 2
July 18, 2023
free for research and commercial use
SeamlessM4T: a foundational multimodal model for speech translation
August 22, 2023
SeamlessM4T, a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text
Code Llama: large language model for coding
August 24, 2023
Code Llama outperformed state-of-the-art publicly available LLMs on code tasks
Stable Signature: watermarking images created by open source generative AI
October 6, 2023
Stable Signature closes the potential for removing the watermark by rooting it in the model with a watermark that can trace back to where the image was created.
以上總結(使用 llama-2-70b-chat 生成)
- Segment Anything Model (SAM) — 第一個影像分割基礎模型
- DINO(自我訓練無標籤)v2 — 第一個使用自我訓練學習的計算機視覺模型
- ImageBind — 第一個可以將信息從六個模式綁定的 AI 模型
- Massively Multilingual Speech (MMS) — 從 100 種語言到 1,000 種語言的語音技術擴展項目
- Voicebox — 第一個可以跨越任務的語音生成模型
- CM3leon — 一個可以進行文字到圖像和圖像到文字生成的基礎模型
- Llama 2 — 下一代開源大型語言模型,免費供研究和商業使用
- AudioCraft — 一個簡單的框架,可以從文字 inputs 生成高質量、實際的音頻和音樂
- SeamlessM4T — 一個多模式基礎模型,提供高質量的語音翻譯,讓不同語言的人們能夠互相溝通
- Code Llama — 一個基於 Llama 2 的大型語言模型,免費供研究和商業使用,並且在編程任務上表現出色
- Stable Signature — 一個新的方法,可以將水印加入由開源生成的圖像中,並且可以追溯到圖像的原始來源