How Voxels Became ‘The Next Big Thing’
We’ve talked with the amazing guys behind Atomontage, trying to figure out if the voxels can actually return and take down the polygons.
Branislav: I used to compete in the demoscene in Europe between 2000 and 2002. I wrote a few 256-bytes demos (also called intros) under the nickname Silique/Bizzare Devs (see “Njufnjuf”, “Oxlpka”, “I like ya, Tweety” and “Comatose”), where each of the intros generated real-time voxel or point cloud graphics. Both voxels and point clouds are examples of sampled geometry.
The intros did their job with about 100 CPU instructions like ADD, MUL, STOSB, PUSH and similar. However, due to the nature of these type of programs, tens of instructions were actually used just for setting up things, not for generating the actual graphics. Still, those 50+ instructions, which are basically elementary mathematical and memory operations, were enough to generate pretty neat moving 3D graphics at real-time speeds. All these 256B intros won the 1st to 3rd prize and that made me realize that when such 3D graphics are so easy to create without polygons, it could also be possible to achieve much more in games and beyond, by doing the same there: using sampled geometry instead of polygon meshes. Simplicity is the key. I saw that the then-dominant paradigm based on complicated and fundamentally limited (non-volumetric) representation was going to hit a complexity ceiling, so the timing was right to try this “new“ and simpler paradigm: volumetric sampled geometry.
Dan: While still in high school in Sweden, I started programming a 2D sidescrolling engine, which I eventually built an indie game called “Cortex Command” with. It was like “Worms” or “Liero”, but with real-time gameplay and RTS elements, and more detailed in its simulation of various materials for each pixel of the terrain. Through an “ant farm”-like cross-section view, your characters could dig for gold in the soft dirt and build defensive bunkers with hard concrete and metal material walls. Cortex Command won Technical Excellence and Audience awards at the Independent Games Festival in 2009. Ever since then, I have been wanting to make a fully 3D version of the game — something that could only be done with volumetric simulation and graphics.
About six years ago, I was looking at all the voxel solutions out there, and found Branislav’s work through his website and videos, where he talked about this inevitable shift away from polygonal 3D graphics to something that resembled what I was doing in 2D in my game: simulating the whole virtual world as small atomic blocks with material properties. Not only did his thesis ring true for me, but his tech also seemed to be the best and most convincing around, based solely on his simple but impressive videos. I started donating to his project through his website and struck up a conversation with him, which eventually led to a friendship over the years, and now us co-founding this company. It’s exciting to be part of this pivotal period in such an epic project, where the result of so many years of R&D is finally about to be put into everyone’s hands to revolutionize the creation and consumption of 3D content!
We believe that many big players have realized that polygon technologies have been hitting a complexity ceiling over a decade ago. This problem manifests itself in all kinds of friction: complex toolchains, complex hacks to make destructive interaction and simulation possible, complex geometry representation (polygon surface model + collision models + other models for representing the internal structure, if any), complex and over-engineered approaches to volumetric video, hacks and large code-bases, etc. All this friction makes progress rely almost solely on the increasing horsepower of GPUs — and some things are simply not feasible at all. It‘s a battle that can not be won. So, it‘s likely part of the nature of large companies: Often, they don‘t even try to spend so much time and resources on developing these risky and game-changing solutions; instead, their strategy seems to be to try to acquire the small companies who successfully do.
There are a bunch of techniques people typically consider being voxel-based. The oldest ones used in games were height-map based where the renderer interpreted a 2D map of height values to calculate boundary between the air and the ground in the scene. That‘s not truly a voxel-based approach as there‘s no volumetric data set in use (ie in Delta Force 1, Comanche, Outcast, and others).
Some engines and games use large blocks that themselves have internal structures and make up the virtual world (ie Minecraft). These blocks are typically rendered using polygons, so the smallest elements of them are triangles and texels — not really voxels. That geometry is simply organized into a grid of larger blocks, but that doesn’t make them voxels, strictly speaking.
Some games use relatively large voxels or SDF (signed distance fields) elements that still don‘t allow realism, but already allow interesting gameplay (ie Voxelstein, Voxelnauts, Staxel). There are also SDF-based projects that allow great interaction and simulation that might have the potential for providing high realism (ie Claybook). However, so far we haven‘t seen attempts to develop a solution for simulation and rendering of realistic large scenes in ways similar to what our tech is capable of.
Atomontage uses voxels as the most basic building blocks of the scenes. The individual voxels in our tech are more or less without structure. This approach results in simplicity that helps greatly in simulation, interaction, content generation as well as in data compression, rendering, volume video encoding, and elsewhere.
Our voxel-based solution removes multiple types of complexity people run into when working with polygon-based technologies — including the whole concept of polygon count limits and the many hacks used to overcome that limit. It is volumetric by nature and therefore doesn’t require any kind of additional representation for modeling the inside of objects.
This approach features a powerful and inherent LOD (level of detail) system which allows the technology to balance performance and quality in traditionally difficult situations. Granular control over LODs, foveated rendering and processing are few of the many benefits added with virtually no overhead cost.
Voxel geometry removes the burden of complicated structure: it is sample-based and therefore easy to work with (a simple, universal data model for any geometry compared with a complex data model of polygon-based assets). This allows us to iterate faster when developing powerful compression methods, interaction tools, content generators, the physics simulator, etc. This is not the case with polygon technologies as these are hitting the complexity ceiling for at least a decade, and their improvements depend strongly on exponential increases in GPU horsepower.
The voxel approach is efficient because it doesn’t waste footprint or bandwidth on the hard-to-compress vector components of vector-based representations (polygons, point clouds, particles), the values of which have an almost random distribution. With voxels, the encoded information is instead mostly the information we want (color, material information, etc.), not the overhead data that just places that information in the right place in space. This can be compared with a JPEG vs. some 2D vector format when used to encode a large complex image. The JPEG encoding is predictable and can be tuned well for optimal quality and small footprint while a vector image would waste most space on the vector information instead of on the actual color samples.
Our approach will allow regular people to unleash their creative talent, without them having to first study or understand the underlying technology and its limits. The skills everybody have acquired while growing up in the real world should be all they need in order to interact with the virtual environments in useful and realistic ways.
Large-scale, high-resolution volumetric video is easily doable using voxels thanks to the inherent LOD system and simple spatial structure of voxel assets. Our voxel-based rendering doesn’t degrade the geometry via mesh decimation and the performance we’re already achieving on common contemporary hardware is unmatched, as far as we have seen.
We’re currently at a stage where we can voxelize not just a single large high-poly asset, but a whole sequence of such assets and compile a volumetric movie out of it. We can also do this with whole environments, so imagine having a movie-quality scene with characters, which could be a sequence from an existing sci-fi or animation film. We can turn this into a VR movie where the user can share the environment with the characters, move their viewpoint freely within it (not just look around like in a 360 video) in a room-scale or larger experience, and in a way enjoy that experience as is common in some VR games, except for the interactivity. We’re now looking for partners who would like to help us do a first trial short-form film with their CG-quality data.
Our tech already allows us to use multiple transforms affecting any voxel of a model, easily creating convincing soft-body deformation effects. Although we don’t have a character animation demo with full skinning and similar features yet, it is something the tech clearly would be able to do if tied in with any conventional rigging system. This can be already seen in our soft-body videos: the tires on the vehicle’s wheels get squashed as a response to simulated forces, and that deformation is similar to what would happen in a character animation when under the influence of rigged bones. Early on, the actual rigging might be done either in some of the existing tools prior to voxelization, and eventually, we’ll provide the feature in our own tools.
Existing textured assets can be voxelized in a number of ways. We are getting best results with two of them — a ray-based voxelizer and a projection based voxelizer, respectively. The former one casts rays through the polygon asset, detects intersections between each of the rays and the mesh, calculates the related positions on the mapped textures, reads the texels and bakes them in the respective voxels that are being plotted.
The projection-based voxelizer renders the asset from multiple vantage points into maps that include depth maps. The intersection of the volumes defined by the depth maps then provides information about the actual voxels that have to be plotted. The other generated maps provide the rest of the surface information (color, normal, etc.) which is also baked into surface voxels.
There are also other ways to create properly colorized surfaces when voxelizing point cloud data or by procedurally generating content or surface properties of existing assets.
These are two separate problems. Content generation is easy with voxels because you only have to plot a large set of samples with some useful properties (color, material information, etc.) into a regular grid with a specified spatial resolution, or multiple resolutions in case of variable LOD dataset. This is easy to do with voxels because you don’t run into polygon count limits. There are also no textures, so texture resolution limit isn’t an issue, either.
Rendering of large scenes is quite easy, too, thanks to the inherently powerful LOD system. The renderer can use the most optimal combination of LODs of tiny segments of the geometry to render the whole scene at the highest possible detail while maintaining high FPS. LODs are inherently cheap with voxels and they are great for keeping the voxel size (and so the shape error) smaller than the size of a pixel on the screen.
Working with scans
Our voxelizers are already powerful enough to voxelize very high-poly meshes and point-clouds using one of the many techniques as mentioned before. We’ve demonstrated first results with photogrammetry data back in early 2013 with a voxelized 150 million polygon mesh we rendered and edited in real-time on a mediocre gaming laptop from 2008. Once voxelized, the polygon count of the source geometry becomes irrelevant, and editing of the asset becomes easy. This can be seen in our videos; performance is dependent more on the pixel count than on polygon count of the source data. All this is essential for providing users with comfortable workflow when cleaning up huge scanned assets. Voxels are also in a way similar to pixels so we’re foreseeing great applications of narrow (“deep learning”) AI for automatic cleanup of photogrammetry data.
The future of VR and streaming
Voxel-based representation is essential for creating massive shared interactive virtual environments that will, at some point, resemble the real world. It’s key that such environments aren’t static and allow users to interact in ways they’re used to in the real world and that means that such virtual environments have to be fully simulated with convincing physics. Simulation and rendering of a complex, fully dynamic world require such worlds to be volumetric — and all processing, synchronization, and rendering of the simulated geometry also has to be efficient. The volumetric nature of this process rules polygon meshes out. Also, as explained before, other vector-based representations aren’t efficient enough for providing the greatest value for the smallest footprint.
We expect that these large simulated environments will consist of mostly voxels, with a small part being simulated using particles for certain effects. It is great that both representations are simple and sampled, and that the conversion from one to the other is trivial and computationally cheap.
These are great benefits of voxels over other representations and that’s what makes them the right solution for representing these massive virtual environments. The LOD system makes them great for on-the-fly optimization of processing and rendering based on any combination of parameters (distance to the user, importance of the simulated process, accuracy vs. cost, etc.). It also makes them perfect for doing foveated rendering and very efficient streaming. They can also be defined at higher dimensionality which is essential for running massively distributed physics simulations. Such simulations cannot be done in only three dimensions, because of the latencies on the network and the impossibility of processing a large asset on a single PC and transferring any large piece of geometry at once. The segmented, variable LOD nature of voxel geometry is of a great help here. When modifying any small part or multiple parts of a large voxel model there’s no need to recalculate its large mesh and textures, nor to synchronize the whole model across the network — it’s just the affected parts that matter, and they can be synchronized at the most suitable LOD the network manages to transfer.
All these requirements of future massive virtual environments make the paradigm shift towards volumetric sampled geometry necessary and inevitable. The qualities of our voxel-based approach make it the best and possibly the only candidate for actually making the paradigm shift happen anytime soon.