Machine Learns — Newsletter #16

Eren Gölge

Published in

Machine Learns

Sent as a

Newsletter

5 min readFeb 29, 2024

🤖 AI: Latest News, Research, and Open-Source

Bookmarks

📌 How to build and run behavioral interviews 🔗Link.
Advice on how to build and run behavioral interviews for hiring. A good read for anyone involved in the hiring process.

📌 In Defense of Thin Wrappers 🔗Link.
A post about important things for creating a business around LLM APIs based on past experience.

Thanks for reading Machine Learns!
Subscribe for free to receive new posts and support my work.

💼 Requests for Startups | Y Combinator 🔗Link
Y Combinator has a list of requests for startups. It’s a good place to find ideas for new projects.

💼 Microsoft strikes deal with Mistral in push beyond OpenAI. 🔗Link

📌 Analysis: How Nvidia Surpassed Intel In Annual Revenue And Won The AI Crown 🔗Link.
“$60.9 billion in revenue, up 126 percent or more than double from the previous year” — How did Nvidia got there?

💼 Mistral Large — New LLM released by Mistral 🔗Link

📌 Pop Culture Has Become an Oligopoly 🔗link
oligopoly — a state of limited competition, in which a market is shared by a small number of producers or sellers.

📌 Tesler’s Law and Design 🔗link

👩‍💼 Startup Resources Toolkit 🔗link
A collection of resources for startups, including tools, templates, and guides.

🤖 USPTO says AI models can’t hold patents 🔗link

Papers

👀 My paper list for more papers

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

📎 Paper

The paper addresses two main challenges: achieving high training efficiency and maintaining stability at scale. The authors employ a full-stack approach, integrating algorithmic and system optimizations like computation and communication overlapping, operator optimization, and network performance tuning.

MegaScale leads to a 55.2% Model FLOPs Utilization (MFU) while training a 175B parameter LLM on 12,288 GPUs, marking a 1.34× improvement over Megatron-LM.

Key to its success are diagnostic tools for in-depth system monitoring, fault tolerance strategies for mitigating failures, and techniques for overcoming stragglers.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

🔗Paper
🔗BitNet

This paper uses BitNet to construct Large Language Models (LLMs) that employ a 1-bit architecture for its parameters, utilizing values of -1, 0, and 1. It significantly reduces computational complexity by only using addition operations. The core innovation lies in its efficient utilization of ternary parameters.

BitNet trains the model with binary weights and quantized activations while retaining full-precision optimizer state and gradients. It only replaces linear layers with quantized binary weights. It also performs significantly better than the post-quantization methods, underlying the importance of quantization-aware training of the model.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

🔗Paper
🔗VideoLDM 🔗ImagenVideo 🔗CascadeDiffusion

This is a 37-page technical report about the latest video generation model of OpenAI, written based on public information and reverse engineering.

For the model architecture, they propose that Sora uses a cascade diffusion model (CDM) for up-sampling and a VAE encoder for latent space compression. They think CDM has both spatial and temporal dimensions and a space-time VAE is trained from scratch with a large dataset of videos based on the references to Imagen Video and Video LDM given in the Sora technical report.

They also report that Sora might use instruction tunning similar to Dall-E 3 where they train a descriptive caption model and then use it to caption additional data to finetune Sora.

Open-Source

Lightning-AI/litdata

🔗Github

“Blazingly fast, distributed streaming of training data from cloud storage for training models”

Specifically crafted for multi-gpu & multi-node (with DDP, FSDP, etc…), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data’s location. Simply stream in the required data when needed.

Set-of-Mark Visual Prompting for GPT-4V — Microsoft

🔗Github

“Set-of-Mark (SoM) prompting, simply overlaying several spatial and speakable marks on the images, to unleash the visual grounding abilities in the strongest LMM — GPT-4V. Let’s using visual prompting for vision!”

Flashinfer

🔗Github

“FlashInfer is a library for Language Languages Models that provides a high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention, and LoRA. FlashInfer focuses on LLM serving and inference, and delivers state-the-art performance across diverse scenarios.”

Adeus

🔗Github

“Adeus is a wearable device that captures what you say and hear in the real world and then transcribes and stores it on your server. You can then chat with Adeus using the app, and it will have all the right context about what you want or need to talk about — a truly personalized, personal AI.”

Large World Model

🔗Github

“Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model. It is trained on a large dataset of diverse long videos and books using RingAttention, and can perform language, image, and video understanding and generation.”

Thanks for reading Machine Learns Substack! Subscribe for free to receive new posts and support my work.

Machine Learns — Newsletter #16

Bookmarks

Papers

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Open-Source

Lightning-AI/litdata

Set-of-Mark Visual Prompting for GPT-4V — Microsoft

Flashinfer

Adeus

Large World Model

Written by Eren Gölge