Weekly AI and NLP News — March 11th 2024

Anthropic releases Claude 3, Inflection improves Pi, and firewalls for AI

Fabio Chiusano

Published in

NLPlanet

4 min readMar 11, 2024

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

Introducing the next generation of Claude. Anthropic has launched Claude 3, a new AI that surpasses GPT-4, with three models: Opus, Sonnet, and Haiku. Each supports a 200k context window, vision abilities, and multiple languages. Opus is touted as the top performer. Sonnet is integrated with Amazon Bedrock and Google Cloud’s Vertex AI, while Opus and Haiku are slated for future release along with new features like function calling and REPL.
Inflection-2.5: meet the world’s best personal AI. Inflection has launched its latest AI version, Inflection-2.5, enhancing its AI model, Pi, with advanced cognitive capabilities that challenge leading language models like GPT-4. Notably, Inflection-2.5 achieves competitive performance in AI tasks, particularly in coding and math, with 40% less computational power required during its training phase. In addition to its improved processing efficiency, Pi now features the ability to conduct real-time web searches to provide updated news and information.
Looks like we may now know which OpenAI execs flagged concerns about Sam Altman before his ouster. Sam Altman, CEO of OpenAI, faced a brief ouster from his position after concerns were raised by two executives, one of which was CTO Mira Murati. The circumstances around his temporary departure in November remain unclear, despite him resuming the CEO role within a week, as reported by The New York Times.
Cloudflare announces Firewall for AI. Cloudflare is developing ‘Firewall for AI’, a Web Application Firewall designed to safeguard Large Language Models from abuse by detecting vulnerabilities and providing enhanced security measures for AI-powered applications.
Google is tackling spammy, low-quality content on Search. Google is updating its Search algorithm to demote low-quality, automated content and elevate more valuable, trustworthy websites in search rankings, focusing on delivering a high-quality content experience.

📚 Guides From The Web

A Practical Guide to RAG Pipeline Evaluation (part 1). An analysis of LLMs such as GPT-4 in the context of retrieval systems shows that while they decently determine context relevance with a 79% accuracy rate for binary relevance, there are challenges faced in terms of low recall and dealing with multiple relevant contexts in complicated queries, indicating room for improvement in precision and recall metrics.
Training great LLMs entirely from ground up in the wilderness as a startup. In the AI startup domain, the process of training large language models hinges not merely on expertise but also on the careful selection of hardware infrastructure. Subpar or inconsistent GPU performance due to quality differences in clusters can significantly hinder model training efficacy.
Gemma on Android and iPhone and more local LLM updates from MLC. The Gemma2B language model can be used on mobile platforms, including Android and iPhone, featuring offline functionality. Leveraging SLM compilation by MLC for Python, the 2-billion parameter model achieves a generation speed of 20 tokens per second on devices as efficient as the Samsung S23 without requiring an internet connection. Enhanced optimization is achieved through model quantization.
You can now train a 70b language model at home. Answer.ai is introducing an open source system leveraging FSDP and QLoRA that enables the training of a 70 billion parameter language model on just two 24GB GPUs.
Captain’s log: the irreducible weirdness of prompting AIs. Effective prompting techniques such as adding rich contexts, custom examples, and adopting a “Chain of Thought” strategy significantly enhance the performance of AI models like Meta’s Llama 2 or GPT-4.

🔬 Interesting Papers and Repositories

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. Chatbot Arena is an open platform designed to enhance NLP by aligning LLMs with human preferences using simple feedback comparisons. It incorporates over 240,000 user votes to refine assessment criteria, promote question variety, and ensure expert agreement, thus confirming the trustworthiness of its results.
Resonance RoPE: Improving Context Length Generalization of Large Language Models. The study presents Resonance RoPE, a solution to improve the ability of Transformers with Rotary Position Embedding (RoPE) to handle longer sequence lengths than those seen during training (train-short-test-long scenarios). This is achieved by enhancing RoPE for out-of- distribution positions to improve model performance on longer sequences, with the advantage of not incurring extra computational costs during operation.
The Unreasonable Effectiveness of Eccentric Automatic Prompts. This study investigates the impact of “positive thinking” prompts on the performance of different LLMs across a dataset of math questions (GSM8K). It concludes that the effectiveness of hand-tuned prompts is not consistent across models, and suggests that systematic, automated prompt optimization is the superior approach for achieving high-quality results from LLMs.
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. Recent research has identified a vulnerability in LLMs, where ASCII art can be used to conduct jailbreak attacks by exploiting their weaknesses in interpreting non-semantic prompts. The ViTC benchmark has been developed to test LLMs’ abilities against these challenges, revealing that even advanced models such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 are susceptible.
Yi: Open Foundation Models by 01.AI. The Yi model series expands on pretrained language models of 6B and 34B parameters by enhancing them for chat, handling 200K token contexts, and incorporating vision-language capabilities. Leveraging a high-performance computing infrastructure and transformer designs, the Yi models excel due to high- quality training data crafted through rigorous deduplication and filtering processes. The authors also meticulously finetuned a small dataset iteratively with direct input from machine learning engineers.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

Weekly AI and NLP News — March 11th 2024

Anthropic releases Claude 3, Inflection improves Pi, and firewalls for AI

😎 News From The Web

📚 Guides From The Web

🔬 Interesting Papers and Repositories

Written by Fabio Chiusano