MachineLearns — Newsletter #10

Eren Gölge
Machine Learns
Published in
Sent as a

Newsletter

6 min readDec 6, 2023

AI: Latest News, Research, and Open-Source

Hey everyone!!

The last two weeks were full of new releases and updates. Every time I update my Twitter feed, I encounter something novel that stokes my persistent FOMO. It’s astounding what’s happening in AI currently. Let’s delve deeper.

Thanks for reading Machine Learns! Subscribe for free to receive new posts and support my work.

(Please note that these are personal notes transformed into a newsletter by my hobby project 🤖ME_AI, So excuse its bits and bytes for any mistakes)

Bookmarks

An extensive overview of the AI world (115 slides) by Coatue 🔗Link. It appears that the AI bubble is gradually deflating. User interest is waning and the hype is slowly but surely losing momentum. Yet, AI development remains as robust as ever. The ascent of open-source AI is noticeable, but a gap exists to bridge with commercial solutions.

AI system self-organises to develop features of brains of complex organisms 🔗Link. The research from the University of Cambridge demonstrates that artificially intelligent systems can develop features similar to the brains of complex organisms when they are subjected to physical constraints similar to those faced by the human brain. This approach allows the AI system to solve tasks by mimicking the development and operational constraints of biological brains.

Spanish AI model making €10,000 per month on Instagram 🔗Link Aitana, an exuberant 25-year-old pink-haired woman from Barcelona whose physical appearance is close to perfection. She uploads photos of herself in lingerie to Fanvue, a platform similar to OnlyFans. In just a few months, she has managed to gain more than 121,000 followers on Instagram and her photos get thousands of views and reactions

Millions of new materials discovered with deep learning — Google DeepMind 🔗Link The new tool, called GNoME, uses deep learning to predict the stability of new materials. GNoME was able to identify 2.2 million new materials, including 380,000 that are stable enough to be used in new technologies.

How much waste do solar panels and wind turbines produce? 🔗Link. Although it is not mentioned in the article, it is important to consider the amount of energy wasted for transportation, production, and maintenance of the panels and turbines. When these are considered the effective efficiency of these technologies significantly drops (for details).

The T-shirt chewing enzyme ready to tackle plastic waste — BBC News 🔗Link A new enzyme that can break down plastic waste has been discovered.

Amazon Unveils Its AI Chatbot Amazon Q 🔗Link

StabilityAI released SDXL Turbo 🔗Link. This new model can generate images in real time. It is trained with distillation to generate high-fidelity images with only 4 diffusion steps.

PIKA just launched idea-to-video AI 🔗Link

Papers

MagicAnimate

ProjectPaperGithubDemo

MagicAnimate is a new method for animating human images using a diffusion model. Existing animation methods often suffer from artifacts, such as jittering and flickering. MagicAnimate addresses these challenges by using a video diffusion model to encode temporal information and an appearance encoder to retain the reference image. It also uses a video fusion technique to encourage smooth transitions. The results show that MagicAnimate outperforms other methods on two benchmarks.

Here is the breakdown of the system.

Reference Image and Motion Sequence Input: You start with a reference image and a motion sequence. MagicAnimate then gets to work.

Temporal Consistency Modeling: It uses a video diffusion model, which gives it the power to understand how things should move over time.

Appearance Encoder: This step is all about making sure the animation looks like the original image. It keeps the identity and background details consistent.

Animation Pipeline: Combining the motion sequence with the reference image, MagicAnimate brings the image to life, following the movement patterns.

Long Video Animation with Smooth Transitions: For longer videos, it uses a neat trick of overlapping segments and averaging predictions, making the transitions smooth and natural.

Image-Video Joint Training Strategy: This is the learning phase. It trains on both image and video data to get better at preserving the reference image details and improving animation quality.

VividTalk

ProjectPaperGithub

This is a paper about audio-driven talking head generation. It discusses different aspects of talking head generation, such as lip-sync, facial expressions, head pose, and video quality. The authors propose a new method called VividTalk, which is a two-stage framework that generates high-quality talking head videos. The first stage maps the audio to a 3D mesh, and the second stage transforms the mesh into a video. VividTalk outperforms previous state-of-the-art methods.

Audio-To-Mesh Generation: The system first reconstructs the reference facial image into a 3D mesh. It learns two types of motion from the audio: non-rigid facial expression motion (like mouth movements) and rigid head motion. These are modeled using blend shapes, vertex offsets for facial expressions, and a learnable head pose codebook for head movements. This process results in a 3D-driven mesh animated according to the input audio.

Mesh-To-Video Generation: The driven 3D meshes are transformed into 2D dense motion using a dual-branch motion-VAE (Variational Autoencoder). This motion is then used to animate a reference image, resulting in the final video output. VividTalk synthesizes each video frame in a frame-by-frame manner, ensuring lip-sync and realistic facial expressions and head movements.

VividTalk is unique in its use of a dual-branch motion-VAE and a learnable head poses codebook, allowing for more natural and diverse head movements and facial expressions.

Grace: Lifelong model editing with key-value adaptors

PaperGithub

Lately, I have been interested in life-long learning problems and encountered GRACE. GRACE (General Retrieval Adaptors for Continual Editing) allows for the editing of a model’s behavior in a parameter-efficient way, without the need for retraining or finetuning.

Here’s a rundown of how it works.

Adaptor Addition: GRACE adds an Adaptor to a chosen layer of the pre-trained model. This Adaptor is responsible for modifying the transformations between layers for specific inputs.

Codebook Creation: The Adaptor maintains a codebook that stores edits. These edits are used to adjust the model’s predictions without altering the original model weights.

Deferral Mechanism: GRACE includes a mechanism to decide whether to use the codebook for a given input, allowing the model to defer to its pre-trained weights when necessary.

Parameter Efficiency: By not altering the original model weights and using a small, discrete codebook, GRACE is efficient in terms of parameters, making it suitable for continual editing of the model.

Open Source

Screenshot-to-Code

Github

This simple app converts a screenshot to code (HTML/Tailwind CSS, React, Vue, or Bootstrap). It uses GPT-4 Vision to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website!

Self-Operating Computer Framework

Github

A framework to enable multimodal models to operate a computer. Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.

Verba

Github

Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally or through LLM providers such as OpenAI, Cohere, and HuggingFace.

Some content has been disabled in this document

Thanks for reading Machine Learns! Subscribe for free to receive new posts and support my work.

--

--