Weekly AI and NLP News — June 3rd 2024
Gemini Pro at 2nd position in LMSYS leaderboard behind GPT-4o, xAI’s $6B funding, and China’s $47B chip fund
Published in
4 min readJun 3, 2024
Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!
😎 News From The Web
- Gemini 1.5 Pro/Advanced at #2 on the LMSYS leaderboard, right behind GPT-4o. The latest LMSYS leaderboard shows that Gemini 1.5 Pro/Advanced ranks second, right behind GPT-4o, while Gemini 1.5 Flash holds the ninth position, surpassing Llama-3–70b and closely competing with GPT-4–0120.
- Anthropic hires former OpenAI safety lead to head up new team. Jan Leike has moved from OpenAI to Anthropic to head a new AI safety team dedicated to “superalignment,” focusing on enhancing scalable oversight and large-scale AI alignment research.
- xAI announces series B funding round of $6 billion. xAI has raised $6 billion in a Series B round to expand AI tech deployment, including their Grok-1 series, and to innovate new products, building on a year of significant AI advancements and the Grok-1 open-source release.
- Mistral releases Codestral. Codestral is Mistral AI’s new generative AI model focused on coding, boasting proficiency in over 80 programming languages and a large 32k context window for superior performance in benchmarks.
- China invests $47 billion in largest ever chip fund. China allocated $47.48 billion to a new chip fund aimed at advancing domestic semiconductor production, a critical step toward self-sufficiency and competitiveness in technology sectors, including AI.
📚 Guides From The Web
- Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20. Karpathy has created a guide outlining how to replicate GPT-2 (124M) using the C/CUDA-based llm.c implementation, designed for both single and multi-GPU setups. The training, which costs about $20 and takes 90 minutes, uses the FineWeb dataset of 10 billion tokens. This resource provides installation instructions, dataset prep guidance, and aims to enhance the original GPT-2’s performance with possible future enhancements.
- Training and Finetuning Embedding Models with Sentence Transformers v3. The article discusses the release of Sentence Transformers v3.0, highlighting enhanced capabilities for training and finetuning embedding models to boost task-specific performance, and showcases the updated components including datasets, loss functions, evaluators, and an improved trainer.
- LLMs are not suitable for (advanced) brainstorming. The article critiques current LLMs for their ineffectiveness in advanced brainstorming due to their mimicry of existing data patterns and tendency towards consensus ideas, proposing that LLMs require evolution in training processes to foster genuine creativity.
- Media Companies Are Making a Huge Mistake With AI. The author underscores the pitfalls facing media companies entering AI partnerships that may undermine journalism’s value and sustainability. She advocates for a focus on producing quality journalism rather than seeking immediate financial relief through potentially undervalued licensing agreements with AI entities.
- Mergoo: Efficiently Build Your Own MoE LLM. Mergoo is a library designed to streamline the merging and training of various LLMs into a unified model by employing methods such as mixture-of-experts, mixture-of-adapters, and layer-wise merging.
🔬 Interesting Papers and Repositories
- llmware-ai/llmware: Unified framework for building enterprise RAG pipelines with small, specialized models. Llmware provides a comprehensive framework for constructing enterprise-grade Retrievable Augmented Generation (RAG) pipelines, offering an integrated RAG Pipeline and access to over 50 specialized models for functions such as QA and summarization. It facilitates swift development of knowledge-driven AI applications and is compatible with open-source models, all while eliminating the necessity for GPU server infrastructure.
- Transformers Can Do Arithmetic with the Right Embeddings. The paper highlights that the addition of positional encodings to transformer models significantly enhances their ability to perform arithmetic operations, achieving up to 99% accuracy on adding 100-digit numbers and boosting performance on other reasoning tasks.
- lavague-ai/LaVague: Large Action Model framework to develop AI Web Agents. LaVague is an open-source AI framework designed for building Web Agents. It leverages a World Model to transform website data and goals into commands, which are carried out by an Action Engine compatible with tools such as Selenium or Playwright.
- An Introduction to Vision-Language Modeling. This paper provides an overview of Vision-Language Models (VLMs), discussing their fundamentals, functioning, training techniques, and assessment strategies. It also addresses challenges related to the complex nature of visual data and the incorporation of video content for individuals new to this area of artificial intelligence research.
- Matryoshka Multimodal Models. The paper presents Matryoshka Multimodal Models (M3), which improve the efficiency of Large Multimodal Models (LMMs) such as LLaVA by offering adjustable visual token granularity to match the complexity of images during inference.
Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!