Weekly AI and NLP News — April 15th 2024

Llama 3, Gemini 1.5 generally available, and Grok-1.5V

Fabio Chiusano

Published in

NLPlanet

4 min readApr 15, 2024

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

Meta’s open source GPT-4 competitor Llama 3 is coming soon. Meta is set to release Llama 3, an AI assistant intended to outperform its predecessors and compete with OpenAI’s GPT-4. It will debut with two preliminary versions before launching a comprehensive multimodal iteration in the summer.
Gemini 1.5 Pro Now Available in 180+ Countries; With Native Audio Understanding, System Instructions, JSON Mode and More. Gemini 1.5 Pro has launched globally, offering cutting-edge native audio understanding and upgraded features such as a File API, system instructions, JSON mode for developers, along with advanced audio/video modalities, including video quiz capabilities. The update also introduces a highly performant text embedding model.
GPT4 Turbo has been upgraded and is out of preview. The new GPT-4 Turbo, now with vision capabilities, supports vision requests via JSON mode and function calls, with knowledge updated until December 2023.
x.AI Unveils it’s First Multimodal model, Grok-1.5 Vision. x.AI, launched by Elon Musk, introduces Grok-1.5V, an advanced multimodal AI model with enhanced capabilities for analyzing visual data, including text, charts, and images.
TikTok may add AI avatars that can make ads. TikTok is investigating the integration of AI-powered avatars to deliver more personalized and engaging advertising experiences by aligning ad content with user interests.

📚 Guides From The Web

Speech to Text Providers Leaderboard & Comparison. Artificial Analysis has evaluated multiple speech-to-text models and APIs from providers like OpenAI, Azure, Amazon Transcribe, and Google, focusing on metrics such as word error rate, performance speed, and pricing.
Vision Language Models Explained. Vision language models (VLMs) are multimodal AI systems capable of interpreting images and text, utilized for tasks like image captioning and visual questioning. They demonstrate strong zero-shot learning and can handle various image formats. Examples include LLaVA 1.6 and Yi-VL-34B.
How To Use AI To Automate Document Processing. Advancements in AI have evolved from traditional OCR and basic NLP to sophisticated IDP and Large Language Models, enhancing the interpretation and handling of elaborate document configurations.
Building reliable systems out of unreliable agents. The article presents methods for developing dependable AI systems by employing unreliable agents. It details steps involving prompt engineering, performance optimization, eval systems, data-driven fine-tuning, and Retrieval Augmented Generation (RAG), with a notable strategy of utilizing complementary agents to boost system dependability.
Measuring the Persuasiveness of Language Models. New research demonstrates that the persuasiveness of Anthropic AI models increases with each generation, with the latest model, Claude 3 Opus, matching the convincingness of human-generated arguments.

🔬 Interesting Papers and Repositories

karpathy/llm.c: LLM training in simple, raw C/CUDA. Andrej Karpathy’s project focuses on developing a minimalist GPT-2 training framework using C/CUDA to eliminate heavy dependencies like PyTorch or cPython. The goal is to recreate the PyTorch model within approximately 1,000 lines of code while improving performance with direct CUDA integration and tailored CPU optimizations.
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. Apple researchers have developed Ferret-UI, an advanced multimodal large language model (MLLM) specifically designed for improved interpretation and interaction with mobile user interface (UI) screens.
RULER: What’s the Real Context Size of Your Long-Context Language Models?. The needle-in-a-haystack (NIAH) test has been used to assess long-context language models by measuring their ability to find specific information within extensive texts. Recognizing the limitations of NIAH’s assessment of deep understanding, researchers have developed the RULER benchmark. This new benchmark offers more intricate evaluations by allowing customization of sequence lengths and task complexities, introducing different needle types and quantities, and adding more challenging task categories such as multi-hop tracing and aggregation.
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. The work presents a method for scaling LLMs to handle infinitely long inputs while maintaining bounded memory and computational requirements. It introduces Infini-attention, an attention mechanism integrating compressive memory with both local masked attention and long-term linear attention within a Transformer block.
Rho-1: Not All Tokens Are What You Need. The authors analyze token importance in language model training, uncovering varying loss patterns among tokens. This research leads to the development of RHO-1, a new language model that employs Selective Language Modeling (SLM) to focus on training with tokens that are more beneficial for the model, rather than treating all tokens with equal importance.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

Weekly AI and NLP News — April 15th 2024

Llama 3, Gemini 1.5 generally available, and Grok-1.5V

😎 News From The Web

📚 Guides From The Web

🔬 Interesting Papers and Repositories

Written by Fabio Chiusano