How Diffusion Models work part1(Artificial Intelligence)

Monodeep Mukherjee
3 min readOct 21, 2022

--

Photo by Steve Johnson on Unsplash
  1. Representation Learning with Diffusion Models(arXiv)

Author : Jeremias Traub

Abstract : Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation. Applied in the latent space of a powerful pretrained autoencoder (LDM), their immense computational requirements can be significantly reduced without sacrificing sampling quality. However, DMs and LDMs lack a semantically meaningful representation space as the diffusion process gradually destroys information in the latent variables. We introduce a framework for learning such representations with diffusion models (LRDM). To that end, a LDM is conditioned on the representation extracted from the clean image by a separate encoder. In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process. By introducing a tractable representation prior, we can efficiently sample from the representation distribution for unconditional image synthesis without training of any additional model. We demonstrate that i) competitive image generation results can be achieved with image-parameterized LDMs, ii) LRDMs are capable of learning semantically meaningful representations, allowing for faithful image reconstructions and semantic interpolations. Our implementation is available at https://github.com/jeremiastraub/diffusion.

2.Diffusion Models already have a Semantic Latent Space (arXiv)

Author : Mingi Kwon, Jaeseok Jeong, Youngjung Uh

Abstract : Diffusion models achieve outstanding generative performance in various domains. Despite their great success, they lack semantic latent space which is essential for controlling the generative process. To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation: homogeneity, linearity, robustness, and consistency across timesteps. In addition, we introduce a principled design of the generative process for versatile editing and quality boost ing by quantifiable measures: editing strength of an interval and quality deficiency at a timestep. Our method is applicable to various architectures (DDPM++, iD- DPM, and ADM) and datasets (CelebA-HQ, AFHQ-dog, LSUN-church, LSUN- bedroom, and METFACES). Project page: https://kwonminki.github.io/Asyrp/

3. Efficient Diffusion Models for Vision: A Survey(arXiv)

Author : Anwaar Ulhaq, Naveed Akhtar, Ganna Pogrebna

Abstract : Diffusion Models (DMs) have demonstrated state-of-the-art performance in content generation without requiring adversarial training. These models are trained using a two-step process. First, a forward — diffusion — process gradually adds noise to a datum (usually an image). Then, a backward — reverse diffusion — process gradually removes the noise to turn it into a sample of the target distribution being modelled. DMs are inspired by non-equilibrium thermodynamics and have inherent high computational complexity. Due to the frequent function evaluations and gradient calculations in high-dimensional spaces, these models incur considerable computational overhead during both training and inference stages. This can not only preclude the democratization of diffusion-based modelling, but also hinder the adaption of diffusion models in real-life applications. Not to mention, the efficiency of computational models is fast becoming a significant concern due to excessive energy consumption and environmental scares. These factors have led to multiple contributions in the literature that focus on devising computationally efficient DMs. In this review, we present the most recent advances in diffusion models for vision, specifically focusing on the important design aspects that affect the computational efficiency of DMs. In particular, we emphasize the recently proposed design choices that have led to more efficient DMs. Unlike the other recent reviews, which discuss diffusion models from a broad perspective, this survey is aimed at pushing this research direction forward by highlighting the design strategies in the literature that are resulting in practicable models for the broader research community. We also provide a future outlook of diffusion models in vision from their computational efficiency viewpoint.

--

--

Monodeep Mukherjee

Universe Enthusiast. Writes about Computer Science, AI, Physics, Neuroscience and Technology,Front End and Backend Development