Google’s new Gemini announced December 2023

Brian Lee
CodeAI
Published in
2 min readDec 23, 2023

Google introduced their new AI model that is looking to be their most capable and versatile AI model to date.

Image from Google Images

Performance:

Gemini is a multimodal engine that is capable of understanding text, images, audio, and video. Gemini will come optimized in 3 different sizes: Gemini Ultra, Gemini Pro, and Gemini Nano. Gemini Ultra shows off its incredible performance, outperforming human experts in language comprehension in MMLU (massive multitask language understanding), which utilizes a combination of 57 sectors of knowledge including subjects such as math, physics, history, etc with a score of 90%.

Benchmark score comparison for text between GPT vs Gemini

Furthermore, regarding multimodality, Ultra makes impressive advances in multimodal reasoning tasks. For example, on the MMMU (Massive Multi-discipline Multimodal Understanding and Reasoning) benchmark testing, Ultra was able to outperform on various tasks better than any model to come before with a state-of-the-art score of 62.4%.

Benchmark scores for images, audio, and video GPT vs Gemini

Architecture:

Gemini models are designed to adeptly process a diverse range of inputs, seamlessly integrating textual, audio, and visual elements. These encompass natural images, PDFs, charts, screenshots, and videos. Leveraging this capability, Gemini can generate comprehensive outputs comprising both text and images. Drawing inspiration from Google’s Flamingo, coCa, and PaLI, Gemini’s visual encoding differs by being inherently multimodal and capable of producing images through discrete image tokens from the model’s inception. Video comprehension is achieved by encoding videos into a sequence of frames within a broad contextual window. These video frames are then treated as images, seamlessly woven into the model’s input alongside text and audio components.

Conclusion:

Gemini 1.0 is being implemented across various Google products — this includes the bard chatbot. They are also planning implementations into their other products like their search engine, ads, and the chrome application itself. Currently, their services are not available in Canada.

--

--