Machine Learns — Newsletter #19

Eren Gölge
Machine Learns
Published in
Sent as a

Newsletter

4 min readApr 10, 2024

🤖 AI: Latest News, Research, and Open-Source

Welcome to the 19th edition of the Machine Learns newsletter! In the last two weeks, we have seen a ton of model releases. The AI community is buzzing with new ideas and implementations.

Happy reading!

Thanks for reading Machine Learns! Subscribeto receive new posts and support my work.

Bookmarks

📌 Managing Up: How to Meet The Unspoken Needs of Your Manager — Blog

📌 503 days working full-time on FOSS: lessons learned — Blog

📌 Welcome to 2034: A Designer’s Profession in 10 Years — Blog

🖌️ Analyzing UX & UI decisions in classic racing games — Medium

🤖 Apple Introduces MobileCLIP, a State-of-the-Art Image-Text Model for Mobile Devices — Website

🤖 Mistral released a new model Mixtral8x22b — Twitter

📰 Amazon’s Just Walk Out Actually Uses 1,000 People in India — Site

🤖 Llama-Bitnet | Training a 1.58 bit LLM — Medium

🤖 In One Key A.I. Metric, China Pulls Ahead of the U.S.: Talent — NY

🤖 Microsoft & OpenAI planning $100 billion supercomputer Stargate AI — Site

🤖 We’re Focusing on the Wrong Kind of AI Apocalypse — Time

👨‍💻 Inside the failed attempt to backdoor SSH globally — that got caught by chance — Medium

📰 How Stability AI’s Founder Tanked His Billion-Dollar Startup — Forbes

📌 Plentiful, high-paying jobs in the age of AI — Substack

🤖 AI escape velocity: A conversation with Ray Kurzweil — Bessemer Venture Partners

🤖 Towards 1-bit Machine Learning Models — mobiusml

🤖 Jamba: A Groundbreaking SSM — Transformer Open Model — ai21 Debuting the first production-grade Mamba-based model delivering best-in-class quality and performance.

Papers

🔗arxivRALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

RALL-E is a text-to-speech system that uses chain-of-thought prompting to improve the robustness of large language model-based TTS by predicting prosody features and guiding self-attention weights in the Transformer model. Compared to VALL-E, RALL-E significantly reduces the Word Error Rate of zero-shot TTS and correctly synthesizes challenging sentences.

🔗arxiv, 👨‍💻implementation- Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

This paper uses the MoE idea and applies it to the dynamic selection of the model layers to be used for each token in the input sequence. The model can dynamically allocate compute to different parts of the input sequence. It accelerates the training and inference up to 50%.

🔗arxiv, 🔗blogJamba: A Hybrid Transformer-Mamba Language Model

Jamba stacks 2 MoE Mamba blocks and a Transformer block to create a single Jamba block. They explain that Transformer layers were necessary for training stability. It can run on a single 80GB GPU with 140K context length with significantly better throughtput. They released a 12B active parameters, 52B total available parameters model.

🔗arxivEagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Technical report for Eagle (RWKV-5) and Finch (RWKV-6). It covers 100+ languages trained with 1.12 trillion tokens. They introduce a new architecture called Matrix-Valued States (MVS) and Dynamic Recurrence (DR) to improve the performance of the model. “Compared to the baseline RWKV-4, Eagle adds matrix-valued attention states, LayerNorm over the attention heads, SiLU attention gating, and improved initialization. It also removes the Sigmoid activation of receptance. Finch further applies data-dependence to the decay schedule and token-shift.”

🔗arxivElephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

This paper introduces techniques to assess whether an LLM has seen a tabular dataset during training, revealing that LLMs have memorized many popular tabular datasets verbatim, leading to overfitting. However, LLMs show non-trivial performance on novel datasets and are surprisingly robust to data transformations, with much of the few-shot performance on novel datasets being attributed to the LLM’s world knowledge rather than in-context statistical learning abilities.

Open-Source

BlinkDL/rwkv-6-worldHF
RWKV-6 trained on 100+ world languages (70% English, 15% multilang, 15% code).

KhoomeiK/interrupting-cowGithub
The first AI voice assistant that interrupts you

KdaiP/StableTTSGithub
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3.

apple/pytorch-speech-featuresGithub
PyTorch-based Feature Extraction to mimic Kaldi Speech Recognition Toolkit feature extraction.

wandb/openuiGithub
OpenUI lets you describe UI using your imagination, and then see it rendered live.

princeton-nlp/SWE-agentGithub
An open-source Devin that turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

--

--