Machine Learns — Newsletter #13

Eren Gölge

Published in

Machine Learns

Sent as a

Newsletter

4 min readJan 17, 2024

🤖 AI: Latest News, Research, and Open-Source

👋 Hi Everyone!!

Welcome to Machine Learns #13, where I share some of the most interesting things that happened in the field of AI and tech in the last 2 weeks. Let’s dive in…

Thanks for reading Machine Learns! Subscribe for free to receive new posts and support my work.

Bookmarks

🤖 After Voice Clones And Deepfake Videos, AI Can Now Mimic Handwriting 🔗Link

🤖 New solid-state battery charges in minutes, and lasts for thousands of cycles 🔗Link

🤖 Tesla Optimus robot can’t build cars yet, but it is folding clothes 🔗Link

🤖 AI Completed a Famous Unfinished Painting and It Caused Discourse 🔗Link

🤖 OpenAI Launched the GPT Store — a marketplace for custom GPTs — 🔗Link

👩‍💻 Compressing Text into Images 🔗Link

📌 Structure of a Good LinkedIn Post 🔗Link

📌 9 Steps Guide for Better Story Telling 🔗Link

📌 Techno-optimism for 2024 🔗Link

📌 Engineers Say the Job Market Is Getting Much Worse 🔗Link

🪐 Life Beyond Our Solar System: NASA Finds Icy Exoplanets May Have Habitable Oceans and Geysers 🔗Link

👩‍🔬 Mutated Cells to Eat Cancer Cells for Cancer Treatment 🔗Link

👩‍💼 Grading a YC Application 🔗Link

👩‍💼 Can a Customer Be a Number 🔗Link

👩‍💼 Inside Rewind’s path to 170 Series A offers 🔗Link

👩‍💼 Will US companies hire fewer engineers due to Section 174 — something to consider if you are a startup in the US — 🔗Link

⏺️ NVIDIA’s CEO on Leading Through the A.I. Revolution 🔗Link

📰 Google is dropping some features of the Google Assistant 🔗Link

👨‍💻 China developed a nuclear battery producing power for 50 years without charging 🔗Link

Papers

You can see all the papers and more on my Notion page

Towards the Law of Capacity Gap in Distilling Language Models

📄Paper 👩‍💻Code

The paper proposes a new concept called the “law of capacity gap”, turning the previously understood “curse of capacity gap” into a more predictable and manageable phenomenon. This law guides the optimal selection of teacher model size relative to the student model.

It works by distilling a 3B student LM (MINIMA) from a 7B teacher LM (LLaMA2–7B), demonstrating that this approach yields superior compute-performance efficiency on standard benchmarks compared to existing methods. Additionally, an instruction-tuned version, MINICHAT, shows competitive performance against larger models.

Transformers are Multi-State RNNs

📄 Paper

It introduces a new perspective by conceptualizing decoder-only transformers as infinite multi-state recurrent neural networks (RNNs). This approach offers a novel way to understand and optimize transformer models.

The paper demonstrates that transformers can be viewed as RNNs with an unlimited hidden state size. It proposes a method called TOVA (Token Omission Via Attention), which simplifies transformers into finite multi-state RNNs by selectively retaining token states based on attention scores.

This framework enhances computational efficiency and memory utilization of transformer models, especially in handling long-range dependencies and large contexts.

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

📄 Paper

“MoE-Mamba”, a model that combines the Mamba model with a Mixture of Experts (MoE) layer. This hybrid model aims to leverage the efficiency gains of State Space Models (SSMs) and MoE.

MoE-Mamba integrates the efficiency of Mamba’s selective state space approach with the selective activation of MoE, enabling it to achieve high performance with fewer training steps than vanilla Mamba and Transformer-MoE models.

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

📄 Paper

“Self-Extend”, a method for extending the context window of Large Language Models (LLMs) with RoPE without additional fine-tuning. This method enhances LLMs’ capability to handle longer contexts naturally.

Self-Extend utilizes a bi-level attention mechanism, incorporating both group-level and neighbor-level attention. This method relies on the existing self-attention mechanism of LLMs and requires minimal code modification, allowing easy integration with existing models.

Open-Source

Surya — OCR in any language

👩‍💻Github

Surya is a multilingual document OCR toolkit. It can do: accurate line-level text detection, text recognition,
table and chart detection, it works on a range of documents and languages

AI toolkit — Header-only C++ library for creating NPCs

👩‍💻Github

AI Toolkit is a header-only C++ library that provides tools for building the brain of your game’s NPCs. It provides
finite state machines, behavior trees, utility AI, and goal-oriented action planning.

WhiteRabbitNeo — A 33B parameter LLM for Cyber Security

👩‍💻HuggingFace
🔗Website

WhiteRabbitNeo is a model series that can be used for offensive and defensive cybersecurity.

Moore — AnimateAnyone

👩‍💻Github
🚀Demo

This repository contains the code and pre-trained models for reproducing the results of the AnimateAnyone paper. It enables the generation of animated avatars from a single input image.

Machine Learns — Newsletter #13

Bookmarks

Papers

Towards the Law of Capacity Gap in Distilling Language Models

Transformers are Multi-State RNNs

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Open-Source

Surya — OCR in any language

AI toolkit — Header-only C++ library for creating NPCs

WhiteRabbitNeo — A 33B parameter LLM for Cyber Security

Moore — AnimateAnyone

Written by Eren Gölge