3D生成:DreamFusion Part 3: Score Distillation Sampling

圖學玩家
3 min readDec 27, 2023

--

<圖學玩家 第032篇 原創文>

SCORE DISTILLATION SAMPLING (SDS)

Differentiable Image Parameterizations (DIP)

DreamFusion的Diffusion Process,並不是從RGB Space去Sample圖片,而是利用DIP

As long as an image para­meter­ization is differ­entiable, we can back­propagate ( <-- ) through it.

簡單來說,DIP就是用其他可微分的參數來表示一張圖片。我們用θ來表示3D Volume的參數,那麼Sample到的圖片就可以用x = g(θ)來表示,其中g為一Transformation Function,將θ參數轉化為圖片x。

We are not interested in sampling pixels; we instead want to create 3D models that look like good images when rendered from random angles.

Score-based Generative Modeling

我們在Part 2有提到Score Function,這邊稍微再展開介紹一下Score-based Generative Model。

讀者可以理解Diffusion Model為一種Score-based Generative Model

首先先介紹一下EBM (Energy-based Model)

引用自物理力學,fθ(x)是靈活且可參數化的Energy Function,也因此可以用Neural Network去Modeling。而Zθ稱為Normalizing Constant,確保∫pθ(x)dx = 1。透過對EBM公式進行以下操作:

換句話說,我們用來Model fθ(x)的NN,其實就是Part 2提到的Score Function。這個NN可以透過最小化Fisher Divergence去優化:

對Data(圖片) Space中x的Log Likelihood取Gradient,其實就是找到在Data Space中,可以增加p(x) Likelihood的方向。

For every x, taking the gradient of its log likelihood with respect to x essentially describes what direction in data space to move in order to further increase its likelihood.

視覺化Score Function如下圖所示,水平面即x所在的Data Space,縱軸的山峰(亦可解釋為密度較高的區域)可視為Model最終收斂的幾種可能模式(Mode)。像這樣透過Socre Function表示p(x)的分佈並用MCMC去產生Sample,就是所謂的Score-based Generative Modeling。

Collectively, learning to represent a distribution as a score function and using it to generate samples through Markov Chain Monte Carlo techniques, such as Langevin dynamics, is known as Score-based Generative Modeling

SDS

DreamFusion中定義Diffusion Model Training的Loss如下 (w(t)為Weighting Function,其餘與Part 1 Training中的Gradient Descent一致)

將上式中的圖片x,轉成DIP x = g(θ),我們的目的就變成最小化以下Loss:

然而實驗結果證實這樣做的效果並不好。為何如此? 我們首先對Loss取Gradient可以得到:

z(t) is the added noise.

DreamFusion中提到,U-Net Jacobian這個部分需要巨大的計算資源,並且在雜訊小的情況下效果不佳,因此DreamFusion直接忽略掉這部分,並將剩下的部分定義為SDS Loss的Gradient:

從Score-based Generative Model的角度來看,這其實就是去找到Data Space中的某個可能模式(Mode),也就是密度較高的區域。DreamFusion將這個方法稱為Score Distillation Sampling (SDS)。

this loss perturbs x with a random amount of noise corresponding to the timestep t, and estimates an update direction that follows the score function of the diffusion model to move to a higher density region

如果喜歡筆者的文章分享,可以幫忙追蹤圖學玩家,你們的閱讀與追蹤是筆者繼續為大家分享的動力~

系列文章

  1. 3D生成:DreamFusion Part 1: Diffusion Model
  2. 3D生成:DreamFusion Part 2: Score Function
  3. 3D生成:DreamFusion Part 4: Algorithm

Ref

  1. DreamFusion
  2. Understanding Diffusion Models

--

--