Machine Learns — Newsletter #20

Eren Gölge

Published in

Machine Learns

Sent as a

Newsletter

5 min readApr 24, 2024

🤖 AI: Latest News, Research, and Open-Source

👋 Everyone!

The AI world has been buzzing with new model releases over the past 2 weeks!

First off, Llama3. It crashed the benchmarks for open-source models and is giving commercial LLMs a run for their money. They released 8b and 70b parameter models and according to the team, they are training a 400b model that is to be released later on.

Meanwhile, Microsoft dropped Phi-3, a mobile-friendly LLM that packs a punch despite its smaller size. It’s been quite competitive on the benchmarks despite its size.

But here’s the thing: I’ve got a sneaking suspicion that some people might be gaming the system. It seems like they could be training their models with the benchmark datasets to boost their leaderboard rankings. Maybe?

Anyway, that’s the tea for now. Let’s 🤖 delve into 🤖 the newsletter…

Bookmarks

🔖 Perpexity raised 62.7M$ at 1.04B$ valuation — Blog

🔖 Adobe Previews Breakthrough AI Innovations to Advance Professional Video Workflows Within Adobe Premiere Pro — Adobe

🔖 US Air Force confirms first successful AI dogfight — The Verge

🔖 Meta to Make Its Meta Horizon OS Open Source — Meta Quest Blog

🔖 Artificial Intelligence Index Report 2024 — Stanford

🔖 OpenAI Japan — Blog

🔖 New York Stock Exchange mulls 24-hour trading — Yahoo

🔖 Portugal’s New Logo Controversy — Blog

Tutorials

👩‍🎓 Building reliable systems out of unreliable agents — Rainforest QA

👩‍🎓 Practitioners Guide to Triton — YouTube

👩‍🎓 Creating a Transformer From Scratch — Tutorial

👩‍🎓 Diffusion Models for Video Generation — Lil’Log

Models

🤖 Pile-T5: T5 trained on Pile dataset — Blog

🤖 Meta released Llama3 — Meta

🤖 Microsoft released Phi-3, a mobile-size language model — Blog

🤖 LINGO-2: a vision action language model driving cars with natural language — Blog

Blogs

📌 AI leads a service-as-software paradigm shift — Foundation Capital

📌 Status Update: AI UX-Design Tools Are Not Ready for Primetime — Blog

📌 A Bite of History: Apple Logo Evolution Since 1976 — Blog

📌 I asked 100 devs why they aren’t shipping faster. Here’s what I learned — Greptile

Papers

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? — arxiv | code

This work revisits diffusion models for image captioning, highlighting their advantages in holistic context modeling and parallel decoding compared to Auto-Regressive (AR) models. The authors introduce LaDiC, a novel architecture that creates a dedicated latent space for captions, integrates a regularization module, and employs a diffuser and Back&Refine technique, achieving state-of-the-art performance for diffusion-based methods on the MS COCO dataset without pre-training or ancillary modules.

Multi-Head Mixture-of-Experts — arxiv

MH-MoE addresses the issues of low expert activation and lack of fine-grained analytical capabilities in Sparse Mixtures of Experts (SMoE) by employing a multi-head mechanism to split tokens into sub-tokens. These sub-tokens are processed by diverse experts in parallel, enabling the model to attend to information from various representation spaces, enhance expert activation, deepen context understanding, and alleviate overfitting.

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points — arxiv | code

decoupleQ is a novel quantization scheme that aims to compress large models for efficient deployment in real-time applications. By decoupling the model parameters into integer and floating-point parts, decoupleQ transforms the quantization problem into a traditional mathematical optimization problem with constraints, which can be solved using off-the-shelf optimization methods, resulting in a substantial increase in model accuracy, especially at very low bits.

The Illusion of State in State-Space Models — arxiv

State-space models (SSMs) have been proposed as an alternative to transformer architectures for building large language models, with the aim of addressing the transformer’s limitations in expressing sequential computation and state tracking. However, formal analysis reveals that SSMs have similar expressive power limitations to transformers, being unable to solve simple state-tracking problems or express computation outside the complexity class TC⁰, and experiments confirm that SSMs struggle with state tracking despite their recurrent formulation.

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study — arxiv

This paper compares the performance of two Reinforcement Learning from Human Feedback (RLHF) methods, namely Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO), in aligning large language models (LLMs) with human preferences. It conducts theoretical and empirical analyses of both methods, revealing the limitations of DPO and the key factors for PPO’s optimal performance, and demonstrates that PPO surpasses other alignment methods across various RLHF testbeds, achieving state-of-the-art results in challenging code competitions.

Long-form music generation with latent diffusion — axiv | code

This paper explains the Stable Audio model that can generate high-quality, prompt-aligned music of up to 4m 45s in length, as demonstrated by subjective tests and state-of-the-art metrics. It is a Diffusion Transformer model combined with cross-attention for text conditioning. Text embeddings are generated by using the text encoder of a pre-trained CLAP model. It works in a compressed latent space to be able to generate long-form music faster and more efficiently.

Open-Source

arcee-ai/PruneMe — Github
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models

Arize-ai/openinference — Github
Auto-Instrumentation for AI Observability.

TempleX98/MoVA — Github
MoVA: Adapting Mixture of Vision Experts to Multimodal Context

phidatahq/phidata — Github
Phidata is a framework for building AI Assistants with memory, knowledge and tools.

HeyPuter/puter — Github
Puter is an advanced, open-source internet operating system designed to be feature-rich, exceptionally fast, and highly extensible.

BobMcDear/attorch — Github
A subset of PyTorch’s neural network modules, written in Python using OpenAI’s Triton.

jafioti/luminal — Github
Luminal is a deep learning library that uses composable compilers to achieve high performance.

dingo-actual/infini-transformer — Github
PyTorch implementation of Infini-Transformer from “Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

miurla/morphic- Github
An AI-powered answer engine with a generative UI.