Weekly AI and NLP News — May 27th 2024

NVIDIA’s projected revenues exceed expectations, Mistral updates their 7B model, and OpenAI clarifies on Scarlett Johansson’s voice in ChatGPT

Fabio Chiusano
NLPlanet
5 min readMay 27, 2024

--

Image by DALLE 3

Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!

😎 News From The Web

  • Nvidia Stock Surges as Sales Forecast Delivers on AI Hopes. Nvidia’s stock surged 9.3% after a promising sales forecast, pointing to a robust demand for AI technologies. The $28 billion projected Q2 revenue exceeds expectations, highlighting the company’s strong position in the AI market, buoyed by their new Blackwell chips and significant data-center revenue.
  • Microsoft introduces Phi-Silica, a 3.3B parameter model made for Copilot+ PC NPUs. Microsoft has unveiled Phi-Silica, a compact language model with 3.3 billion parameters, tailored for Copilot+ PCs equipped with NPUs. This model is engineered for rapid on-device inferencing, improving productivity and accessibility for Windows users with optimal power efficiency. Phi-Silica is Microsoft’s inaugural local language model, with a release slated for June.
  • mistralai/Mistral-7B-Instruct-v0.3. Mistral has launched version 3 of their 7B model, the models “Mistral-7B-v0.3” and “Mistral-7B-Instruct-v0.3”. Enhancements include an expanded vocabulary of 32,768 terms, integration with the v3 Tokenizer, and new function calling capabilities.
  • OpenAI reportedly didn’t intend to copy Scarlett Johansson’s voice. OpenAI’s selection of a voice for its Sky assistant, which prioritized warmth and charisma, sparked controversy when Scarlett Johansson noted a strong resemblance to her own voice, leading to public and legal issues. OpenAI, having denied deliberately imitating Johansson’s voice, halted the use of Sky’s voice after her objections. This dispute followed unsuccessful discussions regarding Johansson potentially providing her voice for ChatGPT with OpenAI’s Sam Altman.
  • OpenAI sends internal memo releasing former employees from controversial exit agreements. OpenAI reversed a decision that would have required former employees to agree to a perpetual non-disparagement clause in order to retain their vested equity. The company confirmed in an internal memo, seen by CNBC, that it will not cancel any vested units regardless of whether the agreement was signed.
  • Amazon plans to give Alexa an AI overhaul — and a monthly subscription price. Amazon is updating Alexa with advanced generative AI capabilities and launching an additional subscription service separate from Prime in efforts to stay competitive with Google and OpenAI’s chatbots, reflecting the company’s strategic emphasis on AI amidst internal and leadership changes.

📚 Guides From The Web

  • Mapping the Mind of a Large Language Model. Anthropic has made strides in AI interpretability by analyzing Claude Sonnet, a large language model, to associate neuron activations with a vast array of concepts. This work promotes safer AI through improved monitoring, debiasing, and the ability to manipulate features to guide model behavior.
  • What I’ve Learned Building Interactive Embedding Visualizations. The author reports on the creation of advanced interactive visualizations for embeddings from various datasets using tools like PyMDE and Emblaze. The work encompasses data collection, embedding computation, and visualization rendering, showcasing iterative enhancements for better exploratory analysis in AI applications.
  • Living documents as an AI UX pattern. The author examines the application of LLMs in generating dynamic, AI- assisted “living documents” to streamline scientific literature reviews. The system employs semantic analysis to structure data into modifiable tables, focusing on overcoming obstacles such as complex AI management, maintaining user-friendly interfaces, and minimizing operational expenses.
  • GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing. The article explores progress in developing low-bit quantized large language models optimized for edge computing, highlighting the creation of over 200 models that can run on consumer GPUs such as the GTX 3090. These models achieve notable resource efficiency via advanced quantization methods, aided by new tools like Bitorch Engine and green-bit-llm for streamlined training and deployment.
  • Train custom AI models with the trainer API and adapt them to Hugging Face. The article provides a guide for using the Hugging Face Trainer API to streamline the adaptation, training, and integration of AI models with minimal coding effort. It discusses setting up necessary dependencies, data preprocessing, model adjustments, and employing distributed training, culminating in a tutorial on sharing models via the Hugging Face Hub.

🔬 Interesting Papers and Repositories

  • Chain-of-Thought Reasoning Without Prompting. The study investigates the presence of Chain-of-Thought reasoning in pre-trained large language models by altering the decoding process to consider multiple token options. It reveals that this approach can uncover intrinsic reasoning paths, resulting in improved understanding of the models’ capabilities and linking reasoning to greater output confidence, as demonstrated across different reasoning benchmarks.
  • Not All Language Model Features Are Linear. A recent study disputes the linear representation hypothesis in language models by revealing multi-dimensional representations through sparse autoencoders, notably circular representations for time concepts in GPT-2 and Mistral 7B. These representations have proven beneficial for modular arithmetic tasks, and intervention experiments on Mistral 7B and Llama 3 8B underscore their significance in language model computations.
  • Thermodynamic Natural Gradient Descent. The paper presents a novel hybrid digital-analog algorithm that imitates natural gradient descent for neural network training, promising better convergence rates of second-order methods while maintaining computational efficiency akin to first-order methods. Utilizing thermodynamic analog system properties, this approach circumvents the expensive computations typical of current digital techniques.
  • Your Transformer is Secretly Linear. Recent research suggests that transformer decoders in models such as GPT, LLaMA, OPT, and BLOOM show an unexpected near-linear relationship across layers. Experiments indicate that omitting or simplifying the most linear blocks within these transformers does not substantially impact their loss or performance, calling into question current assumptions about the complexity of transformer operations.
  • Diffusion for World Modeling: Visual Details Matter in Atari. DIAMOND is a novel reinforcement learning agent that uses a diffusion-based world model to capture fine visual details that discrete latent models typically miss. It demonstrates superior performance, as shown by setting a new human normalized score record on the Atari 100k benchmark. The authors have made their code and models publicly available for future research.

Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence