Weekly AI and NLP News — June 3rd 2024

Gemini Pro at 2nd position in LMSYS leaderboard behind GPT-4o, xAI’s $6B funding, and China’s $47B chip fund

Fabio Chiusano
NLPlanet
4 min readJun 3, 2024

--

Image by DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

📚 Guides From The Web

  • Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20. Karpathy has created a guide outlining how to replicate GPT-2 (124M) using the C/CUDA-based llm.c implementation, designed for both single and multi-GPU setups. The training, which costs about $20 and takes 90 minutes, uses the FineWeb dataset of 10 billion tokens. This resource provides installation instructions, dataset prep guidance, and aims to enhance the original GPT-2’s performance with possible future enhancements.
  • Training and Finetuning Embedding Models with Sentence Transformers v3. The article discusses the release of Sentence Transformers v3.0, highlighting enhanced capabilities for training and finetuning embedding models to boost task-specific performance, and showcases the updated components including datasets, loss functions, evaluators, and an improved trainer.
  • LLMs are not suitable for (advanced) brainstorming. The article critiques current LLMs for their ineffectiveness in advanced brainstorming due to their mimicry of existing data patterns and tendency towards consensus ideas, proposing that LLMs require evolution in training processes to foster genuine creativity.
  • Media Companies Are Making a Huge Mistake With AI. The author underscores the pitfalls facing media companies entering AI partnerships that may undermine journalism’s value and sustainability. She advocates for a focus on producing quality journalism rather than seeking immediate financial relief through potentially undervalued licensing agreements with AI entities.
  • Mergoo: Efficiently Build Your Own MoE LLM. Mergoo is a library designed to streamline the merging and training of various LLMs into a unified model by employing methods such as mixture-of-experts, mixture-of-adapters, and layer-wise merging.

🔬 Interesting Papers and Repositories

  • llmware-ai/llmware: Unified framework for building enterprise RAG pipelines with small, specialized models. Llmware provides a comprehensive framework for constructing enterprise-grade Retrievable Augmented Generation (RAG) pipelines, offering an integrated RAG Pipeline and access to over 50 specialized models for functions such as QA and summarization. It facilitates swift development of knowledge-driven AI applications and is compatible with open-source models, all while eliminating the necessity for GPU server infrastructure.
  • Transformers Can Do Arithmetic with the Right Embeddings. The paper highlights that the addition of positional encodings to transformer models significantly enhances their ability to perform arithmetic operations, achieving up to 99% accuracy on adding 100-digit numbers and boosting performance on other reasoning tasks.
  • lavague-ai/LaVague: Large Action Model framework to develop AI Web Agents. LaVague is an open-source AI framework designed for building Web Agents. It leverages a World Model to transform website data and goals into commands, which are carried out by an Action Engine compatible with tools such as Selenium or Playwright.
  • An Introduction to Vision-Language Modeling. This paper provides an overview of Vision-Language Models (VLMs), discussing their fundamentals, functioning, training techniques, and assessment strategies. It also addresses challenges related to the complex nature of visual data and the incorporation of video content for individuals new to this area of artificial intelligence research.
  • Matryoshka Multimodal Models. The paper presents Matryoshka Multimodal Models (M3), which improve the efficiency of Large Multimodal Models (LMMs) such as LLaVA by offering adjustable visual token granularity to match the complexity of images during inference.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence