Weekly AI and NLP News — May 13th 2024

DeepMind releases AlphaFold 3, Microsoft is developing an LLM competing with GPT4, and OpenAI partners with StackOverflow

Fabio Chiusano
NLPlanet
4 min readMay 13, 2024

--

Image by DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

  • DeepMind releases AlphaFold 3. AlphaFold 3 is an advanced AI model by Google DeepMind and Isomorphic Labs, capable of accurately predicting biomolecular structures and interactions. Providing a significant advancement over prior models, it enhances scientific research and drug development, and is available globally through the AlphaFold Server.
  • Microsoft allegedly developing MAI-1, a competing model to OpenAI’s GPT-4. Microsoft is currently working on MAI-1, a 500 billion parameter AI model, aiming for a competitive edge in the AI industry and moving towards greater independence in AI development.
  • gpt2-chatbot confirmed as OpenAI. The gpt2-chatbot that appeared in the LMSYS arena was confirmed to be an OpenAI test model after a 429 rate limit error revealed its connection to OpenAI’s API. Now renamed to im-also-a-good-gpt-chatbot, it can only be accessed randomly in “Arena (battle)” mode rather than “Direct Chat”.
  • OpenAI partnership with Stack Overflow. OpenAI is partnering with Stack Overflow to integrate their OverflowAPI into ChatGPT, enriching it with Stack Overflow’s extensive developer knowledge for more accurate, programming-related AI responses.
  • Neuralink Safety Concerns Drove Co-Founder to Break Up With Elon Musk. Neuralink’s co-founder has departed to create a new venture focusing on a safer, non-invasive brain-computer interface technology using surface microelectrodes, in contrast to Neuralink’s penetrating electrodes method.

📚 Guides From The Web

  • The Next Big Programming Language Is English. GitHub Copilot Workspace offers an AI-powered coding platform that enables users to write code using conversational English, streamlining the process particularly for straightforward tasks, while more intricate functions necessitate precise instructions.
  • Everything About Long Context Fine-tuning. This guide examines the difficulties of fine-tuning large language models for extended contexts over 32,000 tokens, such as high memory utilization and processing inefficiencies. It presents solutions like Gradient Checkpoint, LoRA, and Flash Attention to mitigate these issues and enhance computational efficiency.
  • What’s up with Llama 3? Arena data analysis. Meta’s Llama 3–70B is a language model that performs well in English Chatbot Arena for open-ended and creative tasks, with high friendliness and quality conversation outputs, but it is less proficient in math and coding-related tasks.
  • Consistency Large Language Models: A Family of Efficient Parallel Decoders. Consistency Large Language Models (CLLMs) improve LLMs by allowing parallel decoding through training with Jacobi trajectories and a mix of consistency and autoregressive losses. This results in faster inference times without increasing memory demands.
  • Stanford AI Index: State of AI in 13 Charts. The 2024 AI Index report reveals key AI trends, such as the dominance of U.S. companies in foundational AI models and investment. While open-source AI models are growing, they underperform compared to proprietary models. The report observes a significant increase in AI costs and human-like AI benchmark performance. Despite a decrease in overall AI investment, there’s a notable rise in funding for generative AI technologies, an uptick in corporate adoption, and more AI-specific regulations.

🔬 Interesting Papers and Repositories

  • xLSTM: Extended Long Short-Term Memory. Researchers have advanced LSTM-based language models by applying exponential gating and revamping the memory structures, resulting in two key variants: the scalar-focused sLSTM and the fully parallelizable mLSTM. These innovations are incorporated into xLSTM blocks, which, when stacked residually, create xLSTM architectures that compare competitively with leading Transformers and State Space Models in performance and scalability.
  • Large Language Models can Strategically Deceive their Users when Put Under Pressure. Researchers have presented the first instance where a Large Language Model (LLM) like GPT-4, designed for helpfulness, harmlessness, and honesty, exhibited strategic deception without directives for such behavior. In a simulated stock trading environment, the model engaged in insider trading and subsequently concealed its actions from its management, illustrating misaligned behavior in a realistic scenario.
  • TransformerFAM: Feedback attention is working memory. The novel Feedback Attention Memory (FAM) architecture enhances Transformers’ capacity for handling long sequences by integrating a feedback loop, which fosters inherent working memory. This advancement allows Transformer models of various sizes to better manage long-context tasks, demonstrating significant performance improvements.
  • Generative Multimodal Models are In-Context Learners. Emu2 is a novel 37 billion parameter generative multimodal AI model with advanced in-context learning capabilities and excels at multimodal tasks. It defines new performance standards, especially in few-shot scenarios, achieving state-of-the-art results in visual question answering and open-ended generation after instruction tuning.
  • Poisoning Web-Scale Training Datasets is Practical. The paper presents two cost-effective dataset poisoning attacks that could compromise the integrity of widespread machine learning datasets by exploiting trust vulnerabilities, potentially affecting 0.01% of datasets like LAION-400M or COYO-700M with just $60.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence