Machine Learns — Newsletter #21

Eren Gölge
Machine Learns
Published in
Sent as a

Newsletter

6 min readMay 8, 2024

🤖 AI: Latest News, Research, and Open-Source

Hey all 👋,

In the last couple of days, there were 2 important papers released, proposing alternative architectures to transformers. I don’t know if it’s a game changer but let’s go over them first.

“KAN: Kolmogorov–Arnold Network” uses Kolmogorov-Arnold representation theorem. The cool thing about KANs is that you can train them by incrementally expanding or pruning the model. KANs are also interpretable so one can even convert the whole network to a symbolic formula under certain constraints.

xLSTM came out yesterday from the same research lab that created the original LSTM. xLSTM addresses the main problems with LSTM that appear when compared to Transformers. It introduces parallelizable memory for faster training and multi-folded memory mixing similar to the heads in multi-head attention for better representation learning.

Both models are quite new. I believe we’ll see the community experimenting heavily with them. Until then I’ll probably stick with my good-old GPT-2 for my work. In the end, ML is about data, not the model.

It’s a crowded issue this time. Take your time. We’ve 2 weeks until the next one. See you!

Thanks for reading Machine Learns! Subscribe to receive new posts and support my work.

Bookmarks

📌 Generative A.I. Arrives in the Gene Editing World of CRISPR — article
Now, new A.I. technology is generating blueprints for microscopic biological mechanisms that can edit your DNA.

📌 Chinese startup released a SORA competitor video generation model Vidum — article

📌 Extropic’s new computing paradigm with thermodynamicsvideo

📌 Neoplant’s bioengineered plant to purify the air at home — link

📌 Siri for iOS 18 to gain massive AI upgrade via Apple’s Ajax LLM — appleinsider

📌 AI engineers face burnout in ‘rat race’ to stay competitive hits tech — cnbc

News

📰 Google fired the Python team — HN

📰 Bill Gates Is Still Pulling Strings at Microsoft, Overseeing AI Ideas — article

📰 Samsung Electronics’ operating profit jumps 933% in first quarter, beats expectations — article

📰 Amazon Reports $143.3 Billion in Revenue for First Quarter of 2024 — article The company also reported that profit more than tripled, to $10.4 billion, topping Wall Street expectations.

📰 France must curb child, teen use of smartphones, social media, says the panel organized by Macron — article

📰 Apple poaches AI experts from Google, creates secretive European AI lab — article

📰 Tesla preparing its own voice-assistant — [article](Tesla is preparing to launch its own in-car voice assistant)

📰 OpenAI may launch a search engine — article

📰 World’s 1st ‘tooth regrowth medicine’ to be tested in Japan from Sept. 2024 — The Mainichi — mainichi

Model releases

🤖 Snowflake has released a 408-B-parameter Dense + Hybrid Mixture of Experts model — tweet

🤖 Sakana.ai released a Japanese image generation model with “Evolutionary model merging” — blog

🤖 Gigax released LLMs for NPCs in video games — HF

🤖 Google released new CodeGemma model — HF

Tutorials

💬 Unsloth.ai explains how they speed up LLMsvideo

✏️ Realtime Video Stream Analysis with Computer Vision — blog

💬 MAMBA from Scratch: Neural Nets Better and Faster than Transformers — YouTube — video

✏️ Overview of decentralized training methodsprimeintellect

Blogs

📌 How the meaning of colour varies per culture — uxdesign

📌 Simplicity is An Advantage but Sadly Complexity Sells Better — blog

Papers

KAN: Kolmogorov–Arnold Networkarxiv | code

Kolmogorov-Arnold Networks (KANs) are alternatives to Multi-Layer Perceptrons (MLPs), where the activation functions are learnable and placed on edges instead of nodes. They demonstrate that KANs outperform MLPs regarding accuracy, interpretability, and neural scaling laws. It allows continuous training of the models by pruning or expanding layers and the final model can be represented by a symbolic formula that makes KANs interpretable.

xLSTM: Extended Long Short-Term Memoryarxiv

xLSTM improves on the original LSTM by introducing exponential gating, memory mixing with separate heads (sLSTM), and matrix memory with covariance update rule (mLSTM). Then stacked sLSTM and mLSTM blocks with residual connections constitute the xLSTM model. xLSTM is different than RWKV and SSM models with its memory mixing mechanism that is claimed to handle context passing in longer sequences. Two limitations are highlighted. sLSTM modules are not parallelizable therefore they are relatively slow even with optimized CUDA kernels. mLSTM blocks processes squared matrices and therefore introduces memory overheads in return for a faster parallelizable forward pass.

TransformerFAM: Feedback attention is working memoryarxiv

TransformerFAM introduces a way for the model to attend to its own hidden representation, thereby providing a working memory that allows it to process longer sequences. It processes the input sequence in chunks. Each chunk has special FAM tokens that are used to pass context between chunks. In the paper, they compare with sliding window attention and show that TransformerFAM performs better by alleviating the limited historical look-back of the sliding window models with the memory tokens.

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decodingarxiv

LayerSkip speeds up the inference of large language models (LLMs) by applying layer dropout during training, with higher dropout rates for later layers and an early exit loss shared by all transformer layers. The proposed self-speculative decoding approach exits at early layers and verifies and corrects with remaining layers, resulting in speedups of up to 2.16× on summarization, 1.82× on coding, and 2.0× on semantic parsing tasks, while having a lower memory footprint compared to other speculative decoding approaches.

DeLighT: Deep and Light-weight Transformerarxiv

DeLighT is a deep and lightweight transformer model that achieves similar or better performance than standard transformer models with fewer parameters. It efficiently allocates parameters within each transformer block using the DeLighT transformation and across blocks using block-wise scaling. DeLighT networks are 2.5 to 4 times deeper than standard transformers but have fewer parameters and operations. Experimental results show that DeLighT matches or improves the performance of baseline transformers with 2 to 3 times fewer parameters on average.

Open-source

You can check my notion page for more open-source…

prometheus-eval/prometheus-evalgithub

Evaluate your LLM’s response with Prometheus 💯

Open-Roleplay-AI/openroleplay.aigithub

openroleplay.ai is an open source character.ai alternative.

AdityaNG/kan-gptgithub

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

BatsResearch/bonitogithub

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

reorproject/reorgithub

Private & offline AI personal knowledge management app.

kadirnar/whisper-plusgithub

WhisperPlus is a tool to do many like summarization, RAG based chats with videos, diarization, etc.

kingjulio8238/memarygithub

Longterm Memory for Autonomous Agents.

ltzCrazyKns/Perplexicagithub

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Anima/air_llmgithub

Run large LLMs with lower memory usage.

pytorch/torchtitangithub

A native PyTorch Library for large model training

cohere-ai/cohere-toolkitgithub

Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.

--

--