Machine Learns — Newsletter #7

Eren Gölge

Published in

Machine Learns

Sent as a

Newsletter

3 min readOct 24, 2023

News, Papers, and Open-Source

By Dalle-3

Bookmarks

Detecting type 2 diabetes with 10 secs of voice recording. 🔗article

10 Charts That Capture How the World Is Changing 🔗 article

The Techno-Optimist Manifesto by a16z 🔗article

Breakthrough for EV batteries could cut charging times to 6 minutes 🔗article

State of AI Report (must-see summary of the year) 🔗slides

Apple With Generative AI Features on iPhone as Soon as iOS 18 🔗article

IBM has made a new, highly efficient AI processor 🔗article

Deflecting laser beams on air with sound waves. 🔗article

LAION launching Open-Emphatic, an AI agent with empathy and emotional intelligence. 🔗article

Papers

Vec-Tok Speech — Speech Vectorization and Tokenization for Neural Speech Generation

🔬Paper
👩‍💻Github

Vec-Tok Speech is a new framework for speech generation tasks, offering high-quality and expressive speech. It uses a novel speech codec based on speech vectors and semantic tokens, with Byte-Pair Encoding (BPE) introduced to improve language model performance. The framework can be used for various applications, including voice conversion, speaking style transfer, and speech-to-speech translation, and has shown superior performance in experiments.

My 2 cents: It is challenging to generate high-fidelity speech from tokenized representations and also limits the quality of the latest LLM-based Text-to-Speech system. This paper embraces continuous vectors for higher fidelity and discrete tokens for semantic modeling.

InstaFlow! One-Step Stable Diffusion with Rectified Flow

🔬Paper
👩‍💻Github

Diffusion models have greatly improved text-to-image generation, but their multi-step sampling process is slow and computationally expensive. Previous attempts to speed up this process have not succeeded in creating a functional one-step model. This paper explores the use of Rectified Flow, a method previously only applied to small datasets, which refines the coupling between noises and images and aids the distillation process. The authors propose a new text-conditioned pipeline that transforms Stable Diffusion into a fast one-step model, resulting in the first one-step diffusion-based text-to-image generator with high-quality images.

My 2 cents: The most significant technical point is successfully creating the first one-step diffusion-based text-to-image generator with SD-level image quality. This achievement is marked by an impressive Frechet Inception Distance (FID).

Open-Source

XTTS v1.1

👩‍💻 Github

Our (Coqui.ai) best and fastest open-source text-to-speech model got better! XTTS v1.1 adds Japanese to the other 13 languages. Improves audio quality and expressiveness of all the languages. Also fixes some of the known issues. Give it a try! Soon you’ll be able to fine-tune XTTS with your own data.

Unveiling the Siren’s Song: Towards Reliable Fact-Conflicting Hallucination Detection

👨‍💻 Github

A benchmark for detecting fact-conflicting hallucinations in Large Language Models (LLMs) like ChatGPT/GPT-4, which incorporates fact-based chains of evidence for comprehensive factual reasoning and presents TRUTH-TRIANGULATOR, a method that combines predictive results and evidence for more reliable detection.

PESTO

👩‍💻 Github

Optional caption for the image

MemGPT

👩‍💻 Github
🔬 Paper

“Memory-GPT (or MemGPT in short) is a system that intelligently manages different memory tiers in LLMs in order to effectively provide an extended context within the LLM’s limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations.”

Why not subscribe?