3D生成：EG3D

6 min readMar 18, 2023

<圖學玩家第013篇原創文>

僅利用單視角2D照片進行多視角圖像和3D形狀生成一直是個長期存在的挑戰。Efficient Geometry-aware 3D GAN (EG3D) 設計一種顯示與隱式混合的Neural Network架構，並且解耦特徵生成以及Neural Rendering (目的是為了能妥善利用2D CNN 的優勢)，進而達到具3D感知功能的生成。

By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness

EG3D提出一種tri-plane-based的3D GAN架構(如下圖最右邊所示)以及訓練策略，在FFHQ與AFHQ等Dataset達到很好的3D Geometry生成效果。

下圖(a)的隱式表示法，會容易造成Inference上時間的Overhead，而圖(b)的顯式表示法則會造成Memory的Overhead，而圖(c) tri-plane-based則是結合了顯式與隱式表示，改善了以上兩種狀況。

上圖(c)中的tri-plane-based表示法，將3D Postion資訊投影到各平面(xy, xz, yz)，得到Fxy, Fxz, Fyz (透過Bilinear Interpolation得到的值)。將Fxy, Fxz, Fyz做完相加後，會丟到一個輕型的MLP Decoder，得出Density和Color這兩個Feature的值。

由上圖整體的3D GAN架構可以看出，Generator和Discriminator基本參考了StyleGAN2的架構。這邊可以注意到Generator生成的並非RGB影像，而是生成3張32 Channel的Feature Image (Tri-planes中的三張Image)。

在訓練Discriminator的時候EG3D採用Dual Discrimination。從上圖可以看出，Discriminator是對以下兩張Imaget串接後的結果做分辨訓練:

將Feature Map (IF) refine 且upsample後得到的I+RGB
將前述32個Channel中的前三個Channel提取(視為RGB Channel)，得到IRGB後再將其做Upsample

由於串接後是6 Channel的Image，因此用來訓練Discriminator的真實圖片，也串接過自己的模糊備份。

The real images fed into the discriminator are also processed by concatenating each of them with an appropriately blurred copy of itself.

最後來討論一下Pose相關的特徵處理。一般Dataset的圖，都會對相機的Pose (也就是拍照角度)有某種程度的Bias。以FFHQ為例，人臉圖片幾乎都是從正面稍微偏側邊拍攝。

為了生成View-consistent的結果，我們必須將相機Pose與特徵生成解藕。ED3G的做法是將相機參數P作為Generator的Conditioning，Rendering以及Discriminator的訓練。這樣的作法讓最終的Generator對相機的Pose有所感知。

Camera params P for Generator, rendering and Discirminator

實驗的結果及其應用如下。與StyleGan相同，EG3D也可以對Laten Space做線性內插，藉以產生兩張人臉的合體。

Target (left); reconstructed image (center); reconstructed shape (right).

Linear interpolations between latent codes

對於EG3D細節以及其他實驗結果有興趣的讀者，可以參考以下論文:

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.

3D生成：EG3D

Written by 圖學玩家