How Intel enhanced photorealism using machine learning techniques

Published in

Anyverse™

3 min readFeb 11, 2022

Enhancing photorealism has obsessed the computer graphics world for years which has allowed us to see numerous papers, experiments, and even tricks to improve it in recent times. Some of them achieved remarkable results, as Intel Labs did with its machine learning project Enhancing Photorealism Enhancement and about which we will learn more in this article. But guess what, tricks don’t always work… especially to train deep neural networks for AI-based perception systems.

Intel Labs used machine learning to enhance photorealism and make GTA V look more realistic

The origin

As researchers Stephan R. Richter, Hassan Abu Alhaija, and Vladlen Kolten state at the beginning of their paper Enhancing Photorealism Enhancement, “photorealism has been the defining goal of computer graphics for half a century”.

Since Newell and Blinn published their work The progression of realism in computer-generated images in 1977, substantial further progress has been made in the physically-based simulation of light transport, the principled representation of material appearance, and the photogrammetric modeling.

These techniques meant a great step forward to the realism of computer graphics, but if you have a look at the most sophisticated real-time games, it will be quickly revealed that photorealism has not been achieved. There is still a remarkable difference between simulation and reality. Mostly because a physically accurate simulation requires computation power and time not compatible with the real-time response that the games need.

Computer vision and machine learning enter the scene

More recently, we have seen the growth of new complementary techniques in machine learning and computer vision. Techniques that are “based on deep learning, convolutional networks, and adversarial training, bypass physical modeling of geometric layout, material appearance, and light transport. Instead, images are synthesized by convolutional networks trained on large datasets.

These techniques have been used to synthesize representative images from a given domain, to convert semantic label maps to photographic images, and to attempt to bridge the appearance gap between synthetic and real images”. “Images synthesized by these approaches capture aspects of photographic appearance that often elude even state-of-the-art computer games”.

The “experiment”

Computer graphics haven’t stopped evolving, but even the most graphically stunning video games don’t look yet like realistic, real-world footage (the ultimate goal most of the time). This was the seed for Intel Lab researchers to develop a method to make synthetic images more realistic using neural networks and the results when applied to GTA V were impressive!

They trained a neural network using images of a German city’s urban center captured by a car’s built-in camera (based on the Cityscapes Dataset). Then, in addition to processing the footage rendered by GTA V’s game engine, the neural network also uses other rendered data the game’s engine has access to, like the depth of objects in a scene, and information about how the lighting is being processed and rendered. Obviously, that’s a gross simplification, you can read an in-depth explanation of this research here.

The results

As you can see in the previous video, their approach significantly enhanced the realism of rendered images. The surface of the road is smoothed out, highlights on vehicles look more pronounced, and the surrounding hills in several clips look lusher and alive with vegetation.

Unfortunately, you can’t actually play the game in this photorealistic mode since it’s only re-rendering the recorded footage…

Can you enhance photorealism… without tricks?

Approaches like this interesting research may be cool to upgrade the graphics of older games, but turning towards advanced perception, would you train your neural network with similar methods?

Game engines are widely used to train neural networks for autonomous applications with decent results, alone and sometimes in combination with synthetic data by those who aim for more realism. And again, you can obtain a sufficient level of realism, but those who seek accuracy and faithfully simulate real-world scenes already know its limitations… Why not use the “real thing” then? Why not train your system with close to real data from the beginning?