Machine Learns — Newsletter #18

Eren Gölge
Machine Learns
Published in
Sent as a

Newsletter

4 min readMar 27, 2024

🤖 AI: Latest News, Research, and Open-Source

Hey Everyone 👋

We are hiring at Cantina.ai!

If you are interested in joining our Generative AI teams for TTS, image generation, or data engineering, feel free to reach out to me.

You can also check LinkedIn for different roles.

(No Recruiters pls!)

Let’s dive in!

Thanks for reading Machine Learns ! Subscribe to receive new posts and support my work.

Bookmarks

🤖 DeepMind — AI agents that can follow natural-language instructions in video games — Google

🖌️ MS’s guide for inclusive design for mental health — Microsoft

🖌️ Early Apple designer on what makes an interface great — Fastcompany

👩‍💼 Mark Zuckerberg is writing personal emails to AI researchers at Google’s DeepMind to recruit them — Verge

📌 Defensible visual design — Blog

📌 AI and the Future of Work — Blog

🤖 Man Uses AI to Talk to 5,000 Women on Tinder, Finds Wife — Futurism

🤖 World’s first major act to regulate AI passed by European lawmakers — Info

👨‍🔬 First pig kidney transplant in a person: what it means for the future — Nature

🤖 NVIDIA is using AI to turn game characters into chatbots — Verge

👨‍🔬 NASA’s Webb, Hubble Telescopes Affirm Universe Expansion Rate Varies — Nasa

🤖 Scientists use Stable Diffusion like models to design new antibiotics — Nature

📌 UN’s The World Happiness Report 2024 — UN

👩‍💼 16 Changes to the Way Enterprises Are Building and Buying Generative AI — a16z

Papers

🔗arxiv Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Quite-STaR is a new language model that can generate text with less redundancy and more coherence. It uses a new training method that encourages the model to think before generating each token. It causes overhead but also improves the output quality and coherence significantly.

🔗arxivEMO: Emote Portrait Alive — Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

EMO is a talking head model that can generate expressive videos from audio input. It uses multiple latent representations to generate nuanced and expressive facial movements. It takes 15 secs to generate 12 frames. trained with 250 hours of YouTube videos and 2 public datasets.

🔗arxivScalable Diffusion Models with State Space Backbone

It introduces a diffusion architecture based on a state space model very similar to MAMBA. They show that the model converges faster and achieves better results with fewer parameters. Architecturally they update MAMBA to have a bacward pass for bidirectional processing of the input sequence.

🔗arxivDenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

DenseFormer performs a weighted averaging of the current and past values after every Transformer block. They report perplexity improvements with fewer parameters and layers.

🔗arxivEvolutionary Optimization of Model Merging Recipes

This paper introduces an evolutionary approach to finding different model merging recipes. They focus on merging layer weights or stacking layers from different models. They show that the approach can find better recipes than manual tuning.

🔗arxivBranch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

“The Branch-Train-MiX (BTX) method has three steps: 1) branch from a pre-trained seed LLM by making multiple copies of it; 2) train those copies separately on different subsets of data to obtain expert LLMs; 3) mix those expert LLMs by combining them into a single LLM using mixture-of-experts feedforward (FF) layers, and finetuning the overall unified model.”

🔗arxivShortGPT: Layers in Large Language Models are More Redundant Than You Expect

ShortGPT uses Block Influence metrics to detect redundant layers of a model and prune those. They report better results than other pruning methods and it can also be combined with other pruning and quantization methods.

Open Source

microsoft/LLMLingua Github

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

sony/BigVSANGithub

Implementation of a new audio vocoder BigVSAN model — BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

punica-ai/punicaGithub

Serving multiple LoRA finetuned LLM as one with special CUDA kernels.

distil-whisper/distil-large-v3HF

A new version of the Distil-Whisper. It is 6.3x faster than Whisper with comparable accuracy.

OpenInterpreter/01Github

“The 01 Project is building an open-source ecosystem for AI devices. Our flagship operating system can power conversational devices like the Rabbit R1, Humane Pin, or Star Trek computer. We intend to become the GNU/Linux of this space by staying open, modular, and free”

Future-Scholars/paperlibGithub

Open source library for managing academic papers and references.

Skyvern-AI/skyvernGithub

Automate browser taskt with LLMs and Compute Vision.

facefusion/facefusionGithub

Face swapping and enhancement editor

lavague-ai/LaVagueGithub

Open source large action model for video games for automation agents like Rabbit R1.

--

--