Customize Your In-Game Faces

Hongze
AI2 Labs
Published in
5 min readNov 4, 2019

In Role-Playing Games(RPGs), the character customization system is an important part. Players are allowed to edit the facial parameters of in-game characters. Netease Fuxi AI Lab has released a paper named Face-to-Parameter Translation for Game Character Auto-Creation which proposed an end-to-end approach for face-to-parameter translation and game character auto-creation method. Today, let’s focus on how to reimplement this method.

Overview

Fig. 1

The whole processing pipeline is shown above(Fig.1). The Imitator aims to imitate the behavior of a game engine by taking in user-customized facial parameters and producing a facial image. The Feature Extractor focuses on extracting two kinds of features, 256-d facial embeddings and the facial semantic features on the real-world images and the rendered game characters. The final part is the optimization part. The gradient descent method is used to solve the optimization problem. The following parts are the details on each component’s implement.

pre-requisites

Unity-3d (≥ 2018.3.14 required)

Tensorflow (≥ 1.12 required)

Keras (≥ 2.2.4 required)

Imitator

Before training the Imitator, we used Unity-3d to design a male character that has 216 facial parameters. Some rendered faces are shown in Fig. 2. We randomly generated 20,000 individual faces with their corresponding facial customization parameters for training.

Fig. 2

The Imitator is based on DC-GAN’s generator. The whole structure of the network is shown below. Unlike the training process of the GANs, the imitator is fully supervised. The input is the facial parameters and the output is the front view of the game character. So the imitator is similar to the decoder of an Auto-encoder network(Fig.4).

model = Sequential()model.add(Dense(256 * 4 * 4, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((4, 4, 256)))
model.add(UpSampling2D())
model.add(Conv2D(256, kernel_size=4, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(256, kernel_size=4, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(128, kernel_size=4, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(64, kernel_size=4, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(32, kernel_size=4, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(self.channels, kernel_size=4, padding="same"))
model.add(Activation("tanh"))
Fig. 4

We used the SGD optimizer with the batch_size = 16 and momentum = 0.9. The learning rate is set to 0.01 and the loss function is Mean_Absolute_Error.

optimizer = SGD(0.01, 0.9)
self.generator.compile(loss='mae', optimizer=optimizer)

As the paper says, the training stops after 500 training epochs and the learning rate decay is set to 10% per 50 epochs. After training, the performance of the Imitator is very good. Some details of the generated images(Fig. 5) are very similar to images that the game engine produced.

Fig. 5

Feature Extractor

Feature extractor is mainly used to measure the facial similarity between real-world images and the game engine produced images corresponding to the Discriminative Loss and Facial Content Loss. The final loss function can be written as a linear combination of the two objectives L1 and L2.

Final Loss function

Next, I will talk about how to implement these parts.

Discriminative Loss

This part is a face recognition problem. As the paper says, I used the Light-CNN v29 model to extract the 256-d embedding code. Following the implementation of Light-CNN v29, I added the training code with the Adam optimizer and Categorical Cross-Entropy Loss.

lcnn = build()
optimizer = Adam(0.00001)
lcnn.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])

The training data I used is the MS-Celeb-1m Aligned Face dataset. The whole training time took about 2 weeks. Finally, the model reached 97.5% accuracy.

Accuracy and Loss of Light CNN training

Facial Content Loss

As for Facial Content Loss, the paper describes a condensed version of the segmentation network. They use Resnet-50 as the backbone of the segmentation network by removing the fully connected layers and adding a 1*1 convolution layer at its top. To increase the output resolution, they change the stride from 2 to 1 at Conv_3 and Conv_4. To avoid extra effort on re-training the model on the ImageNet, I used a pre-trained segmentation model which based on U-net architecture and Resnet-50.

U-net Architecture

The training dataset I used is the Helen Face dataset. Instead of using 11 labels, I just selected 5 labels which are eyebrows, eyes, nose, lips, and background.

I used Categorical Cross-Entropy as the loss function and Adam as the optimizer. Some results are shown in Fig. 6.

optimizer = Adam(0.00001)
self.resnet.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
Fig. 6

Optimization

As the paper says, I solved the following optimization problem with the gradient descent method.

Fix the Imitator, the face recognition model, and the segmentation model, initialize and update the facial parameters x until reaching the max number of iterations. We set alpha as 0.01, the max number of iterations as 50, the learning rate as 10 and its decay rate as 20% per 5 iterations.

In Keras, we can easily extract the gradient between the network’s output and input using K.gradients function. Some results were generated by our optimization model.

Generated results

Some thoughts

This paper originated a new approach to customize in-game characters. It is very detailed about each of the components. However, it should be noted that the face alignment is very imported to replicated the results. Another part we need to be careful about is the optimization part. I have tried several methods to calculate the gradients between the output and the input such as the Newton CG algorithm, the Sequential quadratic programming and the Newton conjugate gradient trust-region algorithm. In terms of the performance and time, to calculate gradients from Keras is the most efficient way. Further, I will release the code as soon as possible.

See you next time!

--

--

Hongze
AI2 Labs

Tian Hongze | AI Frontier | AI Practitioner | Yoozoo.{AI}