Weekly AI News — July 22th 2024

OpenAI releases GPT-4o mini, Mistral releases NeMo and Codestral Mamba, and Llama 400B may be released this week

Fabio Chiusano

Published in

NLPlanet

4 min read2 days ago

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

GPT-4o mini: advancing cost-efficient intelligence. OpenAI has released GPT-4o mini, an advanced, cost-efficient AI model priced at $0.15/million input tokens and $0.60/million output tokens, offering superior performance at a lower cost than GPT-3.5 Turbo.
Mistral NeMo. Mistral, in collaboration with NVIDIA, has launched the 12B parameter Mistral NeMo model, featuring a 128k token context window, FP8 inference compatibility, and a cutting-edge Tekken tokenizer. It is open-sourced under Apache 2.0, boasts enhanced multilingual capabilities, and outperforms the previous 7B version in instruction-following tasks.
Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI. An investigation revealed that major AI firms, including Apple, Nvidia, and Anthropic, have trained their AI models using subtitles from over 173k YouTube videos, potentially breaching YouTube’s anti-data harvesting policy and raising issues on creators’ rights and compensation.
Codestral Mamba. Mistral has introduced Codestral Mamba, a new coding-centric Mamba model known for efficiently managing long sequences with linear time inference and theoretical support for unlimited sequence lengths. It competes with leading SOTA models and is open-source, accessible for extension through the GitHub repository with integration options like mistral-inference SDK, TensorRT-LLM, and an upcoming llama.cpp.
Meta to drop Llama 3 400b next week — here’s why you should care. Meta plans to launch Llama 3 400B in July 2024, expanding the Llama 3 AI model series. This open-source model will offer improved features for chatbots and multilingual applications, aiming to provide wide access to the latest AI advancements.
Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism. Microsoft CTO Kevin Scott expressed confidence in the growth potential of Large Language Models on a Sequoia Capital podcast, challenging the idea of an AI development peak and emphasizing the advantages of expanding model sizes and training capabilities.

📚 Guides From The Web

Tips for Effectively Training Your Machine Learning Models. The article delivers a detailed walkthrough for training machine learning models, including data preprocessing, feature engineering, addressing class imbalances, employing cross-validation and hyperparameter tuning for model selection, and utilizing ensemble methods to enhance model stability and prevent overfitting.
AI Hallucinations: Where Artificial Intelligence Meets Artificial Imagination. The article examines the issue of “hallucinations” in LLMs, where coherent but factually inaccurate content is generated due to the AI’s reliance on pattern prediction rather than factual data retrieval.
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost. The article details the process of improving Llama-3’s capabilities to nearly match GPT-4’s performance using proprietary data fine-tuning on the Together AI platform. After employing the Math Instruct dataset, the 8 billion parameter Llama-3 model achieved a notable 65% accuracy, exceeding the accuracy of the larger 70 billion parameter version and approaching the 71.4% accuracy of GPT-4o.
Docmatix — A huge dataset for Document Visual Question Answering. Docmatix, an extensive dataset for Document Visual Question Answering, offers 2.4 million images and 9.5 million Q/A pairs from 1.3 million PDFs, improving DocVQA task performance by 20% with the Florence-2 model. It’s accessible on the Hugging Face Hub to enhance Vision-Language Model research and applications.

🔬 Interesting Papers and Repositories

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies. A recent study underscores the critical role of vocabulary size for the performance of large language models, finding that models with up to 3 billion parameters perform better with proportionally larger vocabularies. The introduction of IsoFLOPs analysis recommends larger vocabularies than typically employed, with empirical evidence indicating significant improvements in outcomes, exemplified by higher ARC-Challenge scores when using optimally sized vocabularies.
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models. The Spectra study presents a comprehensive suite comprising 54 language models of varying types, including innovative ternary models (TriLMs), quantized models (QuantLMs), and traditional floating- point models (FloatLMs). These models span a broad spectrum of complexity up to 3.9 billion parameters and are trained on a dataset of 300 billion tokens. Notably, TriLMs demonstrate superior performance over existing ternary counterparts and achieve results on par with half-precision floating-point (FP16) models while utilizing less memory.
Qwen2 Technical Report. The Qwen2 Technical Report showcases the Qwen2 series of language models with 0.5 to 72 billion parameters, surpassing the Qwen1.5 series in benchmarks, multilingualism, and instruction tuning, with the Qwen2–72B model demonstrating notable performance across diverse evaluations.
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models. RECE is a new method for quickly removing inappropriate content from text-to-image diffusion models using a closed-form solution that iteratively realigns target embeddings with inoffensive concepts, thus maintaining the model’s generative performance without the need for further fine-tuning.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

Weekly AI News — July 22th 2024

OpenAI releases GPT-4o mini, Mistral releases NeMo and Codestral Mamba, and Llama 400B may be released this week

😎 News From The Web

📚 Guides From The Web

🔬 Interesting Papers and Repositories

Written by Fabio Chiusano