SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Member-only story

DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints

Synced
SyncedReview
Published in
3 min readDec 26, 2024

--

Recent advancements in training large multimodal models have been driven by efforts to eliminate modeling constraints and unify architectures across domains. Despite these strides, many existing models still rely on separately trained components such as modality-specific encoders and decoders.

In a new paper JetFormer: An Autoregressive Generative Model of Raw Images and Text, a Google DeepMind research team introduces JetFormer, a groundbreaking autoregressive, decoder-only Transformer designed to directly model raw data. This model maximizes the likelihood of raw data without depending on any pre-trained components, and is capable of both understanding and generating text and images seamlessly.

The team summarizes the key innovations in JetFormer as follows:

  1. Leveraging Normalizing Flows for Image Representation: The pivotal insight behind JetFormer is its use of a powerful normalizing flow — termed a “jet” — to encode images…

--

--

SyncedReview
SyncedReview

Published in SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Synced
Synced

Written by Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

No responses yet