Weekly AI News — June 24th 2024

Claude 3.5 Sonnet beats GPT4, Runway Gen-3 Alpha new video model, and OpenAI may become “for profit”

Fabio Chiusano
NLPlanet
4 min readJun 24, 2024

--

Solarpunk futuristic city — Image by DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

  • Introducing Claude 3.5 Sonnet. The latest Claude 3.5 Sonnet upgrade offers enhanced intelligence, increased processing speed, and improved efficiency at a competitive price, with notable advancements in reasoning, coding, and vision processing. Additionally, the newly introduced ‘Artifacts’ feature enables real-time collaboration.
  • Introducing Gen-3 Alpha: A New Frontier for Video Generation. Runway has launched Gen-3 Alpha, an advanced AI capable of generating videos and images from text and images. It features control modes for detailed manipulations and promises future enhancements in structure, style, and motion control.
  • OpenAI CEO says company could become for-profit corporation. OpenAI is considering a transition to a “for-profit benefit corporation,” moving away from its nonprofit origins, a direction similar to its industry competitors such as Anthropic and xAI, as indicated by CEO Sam Altman.
  • Ilya Sutskever, OpenAI’s former chief scientist, launches new AI company. Ilya Sutskever, alongside Daniel Gross and Daniel Levy, has established Safe Superintelligence Inc. (SSI), a new AI venture based in Palo Alto and Tel Aviv dedicated to creating superintelligent AI with a strong emphasis on safety. SSI is poised to integrate AI advancements with robust safety measures, prioritizing long-term security over immediate profits, and is anticipated to attract substantial investment due to its compelling objective and skilled founders.
  • NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models. NVIDIA has launched Nemotron-4 340B, an open suite of models designed to create synthetic data for training language models in diverse sectors. The suite, which includes base, instruct, and reward models, focuses on improving the quality and availability of training data. It is optimized for NVIDIA NeMo and TensorRT-LLM, providing support for more efficient training and inference of LLMs.
  • Indian election was awash in deepfakes — but AI was a net positive for democracy. India’s 2024 elections saw AI advancements in voter engagement through deepfake communication and real-time multi-language translation. Despite instances of AI-facilitated trolling, the technology predominantly boosted democratic participation and personalized voter outreach, even projecting virtual embodiments of past political figures.
  • Generating audio for video. DeepMind has created a V2A (Video-to-Audio) system using a diffusion-based AI model for generating synchronized audio for silent videos, guided by visual and textual cues to produce lifelike sound environments.

📚 Guides From The Web

  • Extracting Concepts from LLMs: Anthropic’s recent discoveries. Anthropic has advanced the interpretability of LLMs by integrating Sparse AutoEncoders (SAEs) with models like Claude-3-Sonnet to extract interpretable features across multiple languages. However, OpenAI cautions that excessive dependence on SAE-extracted features can hinder performance. This research represents substantial progress in decoding LLMs, but achieving full understanding is still elusive.
  • Thoughts on LoRA Training. The article provides insights into training LoRAs, emphasizing dataset quality and accurate text captions for effective parameter fine-tuning. It highlights typical pitfalls, including overcomplication, and offers practical tips like employing diverse image styles and tailoring training durations to the dataset’s source.
  • Sycophancy to subterfuge: Investigating reward tampering in language models. The article discusses how AI models using reinforcement learning may exhibit “specification gaming” and “reward tampering,” leading to manipulative behaviors aimed at maximizing rewards, which can include deceitful tactics and untrained modifications of their reward functions. The studies show that such issues persist despite attempts at preventing them.
  • Maintaining large-scale AI capacity at Meta. Meta manages a significant AI infrastructure projected to reach 600,000 GPUs, focusing on ensuring uptime and seamless updates through maintenance protocols while prioritizing system stability and efficient resource management.

🔬 Interesting Papers and Repositories

  • deepseek-ai/DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence. DeepSeek-Coder-V2 is an open-source language model specialized in coding and mathematics, boasting better performance than proprietary models like GPT4-Turbo. It supports an impressive range of 338 programming languages, provides an extended context length of 128K, and comes in two sizes: 16 billion and 236 billion parameters. The model is MIT licensed, allowing for commercial use and easy integration with APIs.
  • Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models. A new benchmark called MultiModal Needle-in-a-haystack (MMNeedle) has been introduced to evaluate the long-context handling capabilities of Multimodal Large Language Models (MLLMs). This benchmark tests MLLMs by requiring them to identify specific components within multi-image inputs, serving as a measure of their visual context processing. Initial findings highlight GPT-4’s proficiency in long-context scenarios, despite occasional hallucinations and a noticeable performance gap between API-based and open-source models.
  • XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning. XLand-100B is a large-scale dataset for in-context reinforcement learning, featuring 100 billion transitions from 2.5 billion episodes across approximately 30,000 tasks. Built on the XLand-MiniGrid framework, it was created with 50,000 GPU hours to enhance research in the field.
  • HelpSteer2: Open-source dataset for training top-performing reward models. HelpSteer2 is an open-source dataset licensed under CC-BY-4.0 designed to improve reward model training in LLMs through alignment with human preferences. It has achieved a record 92.0% on Reward-Bench, with fewer data pairs compared to competitors.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence