Weekly AI and NLP News — May 6th 2024

Mysterious “gpt2-chatbot” appears, the world-first music video powered by Sora, and memory capabilities for ChatGPT

Fabio Chiusano
NLPlanet
4 min readMay 6, 2024

--

Image by DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

  • Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts. The “gpt2-chatbot” recently emerged on the LMSYS Chatbot Arena, generating discussions within the AI community about its potential relation to new OpenAI models. While demonstrating strong performance, analyses indicate it does not outperform GPT-4, and its exact origin and details are still uncertain.
  • GitHub Copilot Workspace. GitHub has launched Copilot Workspace, a comprehensive developer environment that facilitates the entire coding process, including planning, coding, testing, and deployment, through natural language commands, offering AI-industry professionals an integrated solution for streamlining development workflows.
  • OpenAI CEO Sam Altman says GPT-4 is the dumbest AI model you’ll ever have to use again. OpenAI’s Sam Altman considers GPT-4 the most rudimentary AI that users will encounter as the company progresses towards more sophisticated models like GPT-5, which is expected to feature enhanced abilities such as video generation. He foresees AI developing into highly efficient assistants, performing tasks and providing solutions effortlessly.
  • Pro music video powered by OpenAI’s Sora released in a world-first. Paul Trillo directed the official music video for Washed Out’s “The Hardest Part” using OpenAI’s Sora, a text-to-video AI, producing 700 clips of which 55 were used. The project has stirred ethical discussions within the AI industry.
  • A ChatGPT search engine is rumored to be coming next week. OpenAI is rumored to be launching a ChatGPT-based search engine, potentially at “search.chatgpt.com,” aiming to rival Google by integrating a chatbot feature with traditional search results. This reflects the industry trend of AI potentially revolutionizing standard web search methods.
  • Memory and new controls for ChatGPT. OpenAI is testing a new memory feature for ChatGPT to improve interaction continuity, offering user-managed options for adding, reviewing, and deleting retained information or disabling the feature.

📚 Guides From The Web

  • Advancing AI’s Cognitive Horizons: 8 Significant Research Papers on LLM Reasoning. Recent research in the artificial intelligence domain has been focusing on augmenting the reasoning capabilities of LLMs. A variety of strategies have been explored to improve their performance, including chain-of-thought prompting, strategic and knowledge enhancements, and integration with computational engines. Current challenges lie in the ability of LLMs to self- correct, which remains dependent on external feedback.
  • Improving Prompt Consistency with Structured Generations. The Hugging Face Leaderboards and Evals team has conducted research highlighting the impact of prompt format on model evaluation consistency. They suggest structured generation as a means to standardize outputs, leading to more reliable and comparable performance metrics, with initial findings indicating a reduction in evaluation variance.
  • Comparison of Llama-3 and Phi-3 using RAG. This guide outlines the creation of a self-hosted “Chat with your Docs” application that integrates Meta AI’s Llama3 and Microsoft’s Phi3 language models into a Retrieval Augmented Generation (RAG) system. It describes a Streamlit-based user interface that allows direct performance evaluation of the models, utilizing a sophisticated setup that includes custom knowledge bases, document chunking strategies, embeddings, and vector databases to improve user interactions with documents.
  • SeeMoE: Implementing a MoE Vision Language Model from scratch. This guide discusses ‘seeMoE,’ a PyTorch-based vision language model combining an image encoder, vision-language projection, and an MoE decoder. It utilizes character-level autoregressive language modeling and features innovative noisy top-k gating for dynamic expert selection.

🔬 Interesting Papers and Repositories

  • abi/secret-llama. “Secret Llama” is a private, browser-based chatbot leveraging Llama 3 and Mistral models, designed to run independently without server dependencies thanks to WebGPU support. Prioritizing user privacy, it operates fully offline without any data leaving the local device. The platform is user- friendly and can handle AI models up to 4.3GB.
  • Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models. Prometheus 2 is an open-source language model evaluator that improves upon earlier models by offering a broad array of assessment capabilities, including direct assessments, pairwise rankings, and custom evaluation criteria. It aims to provide evaluation results that better match human judgment and can be tailored to assess both standard and proprietary language models like GPT-4.
  • Better & Faster Large Language Models via Multi-token Prediction. An improved training method for large language models that predicts multiple future tokens simultaneously demonstrates increased sample efficiency and performance in code and natural language tasks. This multi-token prediction method achieves faster inference speeds, up to three times quicker, without increasing training time.
  • PLLaVA: Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. PLLaVA is a parameter-free method for extending image models to video models, designed to overcome issues like performance saturation and prompt sensitivity when fine-tuning image models for video tasks. It utilizes a pooling strategy to balance feature distribution over time, leading to improved results such as a 3.48 score on the Video ChatGPT benchmark and a 58.1% accuracy on the MVBench, setting a new state-of-the-art performance.
  • StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation. StarCoder2–15B-Instruct-v0.1, a transparent and permissive code LLM, utilizes a self-aligned pipeline and its generated content for fine-tuning, achieving a HumanEval score of 72.6. It showcases the viability of self-alignment in producing high-quality code generation without relying on external data sources.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence