Machine Learns — Newsletter #12

Eren Gölge
Machine Learns
Published in
Sent as a

Newsletter

4 min readJan 3, 2024

AI: Latest News, Research, and Open-Source

Happy New Year to All!

We’ve journeyed through another year without succumbing to AI domination. Best wishes for 2024 and remain vigilant around the AI this year :).

Bookmarks

👩‍💼 OpenAI’s annual revenue reaches $1.6B 🔗LINK

👩‍💼 “How Leading AI Startup Investors Approached Artificial Intelligence In 2023” 🔗LINK

📰 “A case for AI alignment being difficult” 🔗LINK

👩‍🔬 “AI is not Conscious” 🔗LINK

📰 Disney’s Mickey Mouse is now in the Public Domain 🔗LINK

👨‍💻 SpaceX launched satellites connecting consumer smartphones to the internet 🔗LINK

👨‍💻 Humanoid robots are joining the workforce 🔗LINK

👩‍🔬 “Images altered to trick machine vision can influence humans too” 🔗LINK

🔍 “How game design works with our brains to create beauty and meaning” 🔗LINK

🔍 “How Promotions and Ratings Work” 🔗LINK

🔍 “The hardest part of building software is not coding, it’s requirements” 🔗LINK

Papers

👀 Here’s my read list, for more papers.

GAIA: Zero-shot Talking Avatar Generation

📄 Paper

GAIA is a framework to generate talking avatars. Unlike previous methods that relied on domain-specific heuristics like warping-based motion representation and 3D Morphable Models, GAIA eliminates these domain priors. It enhances the naturalness and diversity of generated avatars.

GAIA operates in two stages. First, it disentangles each video frame into separate motion and appearance representations. The appearance remains constant, while the motion varies with speech. Second, it generates motion sequences based on the speech input and a reference portrait image. This process involves a Variational AutoEncoder (VAE) for disentanglement and a diffusion model for speech-to-motion generation. The model is trained on a large-scale, high-quality dataset, demonstrating its superiority in naturalness, diversity, lip-sync quality, and visual quality over previous methods.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

📄 Paper

SPIN introduces a self-play mechanism to fine-tune LLMs without requiring additional human-annotated data. This method stands out for its ability to progressively improve LLMs by self-play.

SPIN starts with a supervised fine-tuned LLM model and improves it by having the LLM generate training data from its previous iterations. The LLM is then fine-tuned to distinguish these self-generated responses from those obtained from human-annotated data.

Experimental results show that SPIN can improve the performance of LLMs on a variety of tasks, as the SPIN iterations progress.

Ferret: Refer and Ground Anything Anywhere at Any Granularity (Apple)

📄 Paper | 👩‍💻 Code

Ferret employs a novel hybrid region representation integrating discrete coordinates and continuous features to represent a region in the image. This allows it to process diverse region inputs such as points, bounding boxes, and free-form shapes.

Ferret’s architecture includes a spatial-aware visual sampler for handling varying shapes and sparsity and an image encoder. It is trained on the GRIT dataset, which contains 1.1M samples with hierarchical spatial knowledge and 95K hard negative data for model robustness.

Ferret demonstrates superior performance in classical referring and grounding tasks and can be integrated into daily conversations, enhancing multimodal communication and AI interaction capabilities.

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

📄 Paper | 👨‍💻 Code

PowerInfer addresses the challenge of running memory-intensive LLMs on consumer-grade GPUs by exploiting the high locality in LLM inference. It uses a novel approach that preloads frequently activated neurons (hot neurons) onto the GPU, while less frequently activated ones (cold neurons) are processed on the CPU.

The system features a hybrid GPU-CPU inference engine, adaptive predictors for neuron activation, and neuron-aware sparse operators. It optimizes the utilization of GPU and CPU resources by effectively managing neuron placement and computation. The inference engine adapts to varying activation patterns and maintains efficient computation across different LLMs.

PowerInfer significantly enhances the feasibility of running large LLMs on more accessible, consumer-grade hardware. It presents a scalable and cost-effective solution for LLM deployment, potentially broadening the accessibility and application of advanced AI models in various fields without the need for expensive, server-grade hardware.

Open-Source

Jan

👨‍💻 Code | 🔗 Webpage

Open-source ChatGPT alternative that runs offline on your computer. Supports open-source LLMs and runs on any hardware.

Muzic (Microsoft)

👨‍💻 Code

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

Resemble Enhance

👩‍💻 Code

Resemble Enhance is an AI-powered tool that aims to improve the overall quality of speech by performing denoising and enhancement.

LLM Course

👩‍💻 Code

An end-to-end LLM course with example notebooks, divided into three parts: LLM Fundamentals covers essential knowledge about mathematics, Python, and neural networks. The LLM Scientist focuses on learning how to build the best possible LLMs using the latest techniques. The LLM Engineer focuses on how to create LLM-based solutions and deploy them.

--

--