Weekly AI and NLP News — April 15th 2024

Llama 3, Gemini 1.5 generally available, and Grok-1.5V

Fabio Chiusano
NLPlanet
4 min readApr 15, 2024

--

Image made with DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

📚 Guides From The Web

  • Speech to Text Providers Leaderboard & Comparison. Artificial Analysis has evaluated multiple speech-to-text models and APIs from providers like OpenAI, Azure, Amazon Transcribe, and Google, focusing on metrics such as word error rate, performance speed, and pricing.
  • Vision Language Models Explained. Vision language models (VLMs) are multimodal AI systems capable of interpreting images and text, utilized for tasks like image captioning and visual questioning. They demonstrate strong zero-shot learning and can handle various image formats. Examples include LLaVA 1.6 and Yi-VL-34B.
  • How To Use AI To Automate Document Processing. Advancements in AI have evolved from traditional OCR and basic NLP to sophisticated IDP and Large Language Models, enhancing the interpretation and handling of elaborate document configurations.
  • Building reliable systems out of unreliable agents. The article presents methods for developing dependable AI systems by employing unreliable agents. It details steps involving prompt engineering, performance optimization, eval systems, data-driven fine-tuning, and Retrieval Augmented Generation (RAG), with a notable strategy of utilizing complementary agents to boost system dependability.
  • Measuring the Persuasiveness of Language Models. New research demonstrates that the persuasiveness of Anthropic AI models increases with each generation, with the latest model, Claude 3 Opus, matching the convincingness of human-generated arguments.

🔬 Interesting Papers and Repositories

  • karpathy/llm.c: LLM training in simple, raw C/CUDA. Andrej Karpathy’s project focuses on developing a minimalist GPT-2 training framework using C/CUDA to eliminate heavy dependencies like PyTorch or cPython. The goal is to recreate the PyTorch model within approximately 1,000 lines of code while improving performance with direct CUDA integration and tailored CPU optimizations.
  • Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. Apple researchers have developed Ferret-UI, an advanced multimodal large language model (MLLM) specifically designed for improved interpretation and interaction with mobile user interface (UI) screens.
  • RULER: What’s the Real Context Size of Your Long-Context Language Models?. The needle-in-a-haystack (NIAH) test has been used to assess long-context language models by measuring their ability to find specific information within extensive texts. Recognizing the limitations of NIAH’s assessment of deep understanding, researchers have developed the RULER benchmark. This new benchmark offers more intricate evaluations by allowing customization of sequence lengths and task complexities, introducing different needle types and quantities, and adding more challenging task categories such as multi-hop tracing and aggregation.
  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. The work presents a method for scaling LLMs to handle infinitely long inputs while maintaining bounded memory and computational requirements. It introduces Infini-attention, an attention mechanism integrating compressive memory with both local masked attention and long-term linear attention within a Transformer block.
  • Rho-1: Not All Tokens Are What You Need. The authors analyze token importance in language model training, uncovering varying loss patterns among tokens. This research leads to the development of RHO-1, a new language model that employs Selective Language Modeling (SLM) to focus on training with tokens that are more beneficial for the model, rather than treating all tokens with equal importance.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence