Deforming a mesh in real time in Oculus Quest, using compute shaders and hand tracking, and rendering the deformed mesh without the need of reading back the results to the CPU. I use two different methods to render the deformed mesh, the standard Mesh renderer API and DrawProcedural.
First I will go over the general idea, then talk about implementation details, and discuss performance and alternatives.
As usual you can find the code on my Github:https://github.com/IRCSS/Mesh-Deformation-With-Compute-Shader-Oculus-Quest-Unity
And also download the experience on the Sidequest here: https://sdq.st/a-792
Top Down Look
Mesh deformation is very simple in theory. At its core, you move the vertices around, without the need of changing anything about the topology or the triangles.
Theoretically you can also do this in the CPU by iterating over your vertices. However doing this naively in C# would lead to poor performance.
The GPU is very suited for a task like this. Specially if each vertex doesn’t contribute to the calculation of other vertices. If we want the deformation to be driven by our hand/ controller movement, we need to cover the following general steps:
- Create a buffer in the GPU which our compute shader can work on. This buffer would contain information about the vertex such as a position, UVs, velocity or any other information we might need.
- Run a compute shader which does the deformation. In this compute shader we need a way to detect collision with our controller/ hand. We would calculate a series of forces, and apply the net force at the end on the velocity of the mesh. The velocity moves the positions of the vertices each frame.
- Find a way to somehow render these vertices, we have modified using the traditional graphic pipeline
Step One — GPU Buffer Creation
When you want to render something, you dedicate a space in memory for your vertices. Then your GPU uses the information in this memory to render your triangles. This action is usually hidden away behind an abstraction level in Game Engines. Ideally, our compute shader would change the numbers in the buffer which the Render API from Unity has allocated and populated. However to my knowledge, this is not possible at the moment, using the regular Mesh API in Unity. This would mean we have to create another buffer to hold the vertex information we want to change, and somehow make this accessible for Unity again.
As far as I can tell, the intended way to go about this within Unity’s is to use Compute Buffers and methods like DrawProcedural. A Compute Buffer is a memory container which a Compute shader can read and write to, in a form of a StructuredBuffer, which is a block of memory that contains elements of same size. StructuredBuffer can also be accessed in normal Vertex/ Fragment shaders, as we will be doing in this project. Using the DrawProcedural bypasses some of the stuff Unity does for its standard Renderer API. Things such as correctly populating the shader uniforms like MVP matrices etc. Although this has improved over time, and it is now better integrated with the standard rendering API.
I am not the biggest fan of the API design with compute buffers and draw procedural. It feels like an abstraction level is missing. A layer which unifies the Mesh buffer and Compute buffer and creates something like a ComputeBufferRenderer of type Renderer, with similar functionality as MeshRenderer.
If I understand this post correctly, GraphicsBuffer is a step in that direction. Which is suppose to bridge the Mesh and the Compute Buffer. That would be nice. Since you could then use one Engine API to render everything. At the moment though, GraphicsBuffer can only be used as IndexBuffer, as far as I can tell.
Since a Vertex Buffer as Graphicsbuffer is yet to be fully implemented (at least I don’t find information on it), we will use a Compute buffer to hold our vertex information.
To generate a Compute Buffer we need 3 things. First we need to know how many elements our buffer has, this is our number of vertices. Second, we need to know how big each element is in memory, and we need to provide it with an initial data.
I like to use structs for each element in the buffer. In C#, I will create a struct called _Vertex.
All vertex information I need are positions and UVs. I also added a velocity, which I will be using to move the vertices around. This struct has 8 floats. So each member of the buffer has a size of:
8*size_of_floats = 8*4 = 32 bytes = 256 bits
So if we have 100k vertices, our buffer would be 256 000 000 bits of memory or 32 mb.
First on the CPU side, we create an array which is populated with this information for each of our vertices.
Then we simply create the Compute Buffer and populate it with the array we just created.
GPU_VertexBuffer = new ComputeBuffer(m_vertexBufferCPU.Length, sizeof(float)*8);GPU_VertexBuffer.SetData(m_vertexBufferCPU);
Now we have our Compute Buffer, we just need to sample it in our Compute Shader. The Shader side is pretty simple. We need to provide it with a struct that determines what is in each element, and declare a StructuredBuffer which is a reference to our Compute Buffer. We bind our Compute buffer as a StructuredBuffer in the CPU side using ComputeShader.SetBuffer method.
m_computeShader.SetBuffer(kernel, “_VertexBuffer”, GPU_VertexBuffer);
In our Compute Shader we have:
We are all set for buffer generation. We have our information on the GPU, we can read from it in our Compute Shader and write to it easily through _VertexBuffer[index].position/velocity/uv.
Step Two — Deformation in Compute Shader
Now that we are running a program per vertex in compute shader, we can move each vertex around as we please. We could directly move vertices around with our hand, however that would feel unnatural, since the vertices would stop the moment we stop pushing them. If you want to go full on physically accurate, you can use the Newton’s laws, have a buffer for acceleration, one for velocity and one for position.
Each frame your hand’s mass and acceleration can be used to calculate and apply a force to the acceleration of each vertex.
I am not going to go this far. I am content with modifying the velocities using some approximation of a force. Every frame, I modify my vertices using the velocity they have, and add a new velocity to their old velocity which I have calculated in that frame.
Velocity is defined at the beginning of the scope as:
float3 velocity = _VertexBuffer[id.x].velocity;
If we endlessly add new velocity to our old ones, the vertices keep moving around faster and faster. Due to friction and loss of energy, the vertices should slow down over time. I imitate this by multiplying the velocity every frame with a drag factor. I am omitting a saturate/ clamp function, since I will make sure on the input side that the number remains between 0 and 1.
// Slowing down veloctiy over time. Keep this value under 1. closer to zero is more rigid materials and closer to one more squishy materialsvelocity *= _drag;
A drag value of 0.5 for example would half the velocity each frame.
Now comes the calculation for the hands. I am using a simple sign distance calculation to a sphere which acts as a collider for the hands. I am not going to go in depth for why and how this works, since it is a topic of its own.
There are several components to this. The _pushForce is a balancing parameter I set in the inspector. The smooth step does the collision detection. the _RHandVelocity is calculated and passed on from the CPU. Faster hand movements results in greater force being applied. The clamped dot product ensures that movement only happens in the direction of the applied force.
I should actually normalize _RHandVelocity and clamp the magnitude of it. At the moment I am doing a component wise clamp. However I am not normalizing it, because I want to avoid having to deal with the case of division by zero during normalization, since hand velocity can be a zero vector.
The same calculation is done for the other hand. I would like to replace this later with a more fine, per finger collision detection. For now this is enough for the testing.
Step Three — Rendering the Modified Buffer
In the current Unity version, there are three ways that I know of to render things you create/ modify in a compute shader:
- Read back to CPU and feed the data in the expected form to the MeshRenderer. This is obviously expensive, since it requires a lot of copying around, and messes with pipelining and forces a synchronization point. Not a good idea, it also doesn’t make much sense to send the data to the CPU, only to immediately sent it back again without doing anything with it. This wouldn’t be necessarily as expensive on platforms that share CPU/ GPU memory like Andriod, since you can just move the reference around but only if we implement it as such. Which we can’t, since that level of control is not exposed.
- Draw using DrawProcedural, or one of the other draw calls there.
- A method I have never tried before, which I am trying here. Use the SV_VertexID to sample the modified Vertex Buffer as a structured buffer in vertex shader and ignore the default mesh data which Unity passes on to the vertex shader.
So I am doing number 3 here. Mainly because the whole mesh deformation was an excuse to try out this hack and see if it works. I also implemented the DrawProcedural. You can switch between the two using an enum. The advantage of hacking the modified mesh data in the standard render API, is the possibility of letting Unity to take care of all the draw calls generation, populating the global uniforms, and sorting the mesh and rendering it at the right time.
So what we will be doing in our vertex shader is ignoring the vertex buffer which Unity has provided as input struct and instead sample the StructuredBuffer which we have modified in the compute shader. This has performance implications. For example it might mess with a bunch of performance optimizations which the graphic driver or the engine has done, such as putting some stuff in the cache ahead of times etc. Also for 100k vertices instead of having just one 32 mb of mesh on the GPU memory, we have twice that amount. Worst performance impact, specially on mobile, might be that we are still reading from both buffers in our vertex shader, and not using one. This means wasting bandwidth for no reason. All in all, it is not a great method. The advantages are stated above, so you can choose for yourself base on your situation. I haven’t really benched mark this, so I am not sure how severe the consequences are.
In the vertex shader, you need to declare your StructuredBuffer, which you again need to bind on the CPU side by using the Material.SetBuffer method.
This declaration is basically a copy of what we have in the Compute Shader. with the difference that we don't need to Write to this buffer. Just Read.
We are going to sample this in our vertex shader. But with which Index? To the rescue comes VertexID, which is also supported in Unity and in OpenGl ES 3. So all good for Quest. This is the index number for the vertex into the vertex buffer. Since our Structured buffer has the same order as the vertex buffer of the Mesh, we can use this to sample our vertex information from the compute buffer.
So our vertex shader will become:
That is all we have to do on the shader side. With this, as far as our StructuredBuffer is populated correctly, our mesh is rendered automatically through the Unity’s MeshRenderer API, without the need of reading something back to the CPU or issuing a draw call through Draw procedural. Things like frustum culling and sorting will also work without the need of you doing anything, but obviously it will work on the original and unmodified bounding box of your mesh. Someone mentioned the vertex cache. I don’t think this will effect vertex cache in anyways, since vertices will still be cached and used to skip vertex shader if the input is the same.
Now that we have done the hacky way, time to also implement the proper way. Specially since on Andriod, I have performance issues with the hacky one.
We have almost done most of the work for DrawProcedural already, since the GPU side of things is almost the same, only the CPU side differs.
First thing is to turn off the MeshRenderer of the mesh, if the rendering method is set to WithDrawProcedural. DrawProcedural will be issuing the draw command.
if (m_renderingMethod == RenderMethod.WithDrawProcedural) m_meshRenderer.enabled = false;
The shader will remain the same one. All the GPU buffers for the compute shader are still generated the same way, and we still need to bind the Structuredbuffer to the material of the mesh.
An extra GPU buffer we need for DrawProcedural is the Index Buffer. This one we can create by simply copying over the triangles of the mesh in to a GraphicsBuffer container.
The actual draw command is issued like this:
So the question is, when do we issue this command? CommandBuffers also have a DrawProcedural method. We could use that to provide a more exact location for the rendering to happen, and even provide things like the matrix of the mesh. I am simply calling it on Update. I haven’t checked this on Render Doc, but my guess would be that the RenderQueue settings are used to determine when to do the actual draw call.
If you draw the mesh as it is, you might realize the mesh has lost its local transforms. The view and project matrix are still passed on correctly in to the MVP matrix, but the model matrix is not. You can construct your own MVP matrix, but you would immediately run up against issues with multi camera setup, inconsistency between editor and game view and generally getting things to work well without a lot of case differentiation and code that runs in edit mode. One of the main reason why I wanted to find a way to hack it in the standard API was not to having to deal with all this for prototyping small ideas. As you can see in the code, my hack here was to pass only the model matrix to the shader and multiply the vertex position with VP and M matrix.
Since I still want to use the same shader for both the cased, drawing with procedural and standard API, I populate the _MATRIX_M with identity matrix in case I am not using the DrawProcedural method. Wasted performance right there, but then I don’t have to deal with the stuff stated above.
Some Performance Consideration
The number of compute shaders dispatched are correlated to the number of vertices you have. So first thing to keep in mind, is that less vertices leads to better performance.
One thing I kept thinking about, was how I am running this once per vertex, although I should actually be running it once per vertex position. At the moment, I have around 16k vertices with different positions, but more than 22k vertices in total. For vertices that are shared between different faces, if one of their vertex attributes is different, Unity is forced to pack them as different entry on the vertex buffer. In our case, vertices that belong to different islands on the UV map, can not be merged in to one. What I kept thinking about, was how to run the compute shader only once for these vertices that share position but different in UVs.
Another thing at the back of my head was async compute shaders. Specially if I wanted to do more complicated force calculations. (more on that below)
Last tips are, keep the data per vertex as lean as possible. Make sure your index buffer remains on 16 bit, dont pass on normals/ tangents if you don’t need them and get rid of the deformation by velocity if you have performance issues.
More Advance Force Distribution
A thought I was playing around with in my head was spreading the force around along the topology. At the moment force is applied per vertex, and vertices don’t influence each other. However the impact of an applied force should be passed down along the mesh. An example of this would be a stone dropping in the water, and the wave that propagates along the surface.
while you can do both gather and scatter operations in compute shaders — read information form neighboring vertices and write to them, the writing part will cause racing conditions or bugs. One idea was to run a compute shader on the index buffer per face, after the net force is calculated per vertex. Like this you can gather and distribute the force from the neighboring vertices in the face as you go down the index buffer. I still haven’t prototype this, but I can already foresee some problems such as the force potentially traveling around a face and being applied several times more on a vertex than what should be.
Hope you enjoyed reading. You can follow me on my Twitter: IRCSS
Further Reading and Resources
- Information on the new type GraphicsBuffer: https://forum.unity.com/threads/graphicsbuffer-and-mesh.636631/
- Compute Buffer: https://docs.unity3d.com/ScriptReference/ComputeBuffer.html
- Method Used to draw Index Buffers; https://docs.unity3d.com/ScriptReference/Graphics.DrawProcedural.html
- Structured Buffer: https://docs.microsoft.com/en-us/windows/win32/direct3d11/direct3d-11-advanced-stages-cs-resources