This was my final project for my computer animation course where we were given
the freedom to implement anything beyond or within the scope of the class. Our
professor, Dr. Sueda, after our PBD cloth simulation assignment had linked a
website with other applications for PBD.
I had decided I wanted to do PBF sometime in the future anyway, so I decided to
use that for my final project. We had to make a project proposal where I initially
and suggested a slightly older paper but Dr. Sueda directed me to
Position Based Fluids by Miles Macklin and Matthias Muller. I liked his suggestion. He made sure I was
aware of some of the challenges that I'd face which weren't within the scope of
the class, (e.g. instanced rendering) and with his approval I had my project decided.
Process
I used my WIP Vulkan Renderer because I already had some things done like Image/Buffer
abstractions and basic model loading. I knew I wanted to implement it fully on the
GPU and I'd want to instance the particles, so the first step I took was creating
a GPU buffer and uploading positions for all particles in a cube to make sure it
would work properly. I wanted to get the algorithm right on the CPU first, so I
could more easily identify where I'd need memory barriers and since debugging shaders
is just more difficult in general.
I added gravity and constrained the particles between two points to just keep them
in a box. I implemented the PBF algorithm in the order it was presented in the paper,
first applying gravity and predicting positions, creating lists of neighbors for
each particle, and then executing the solver algorithm. This first requried estimating
the density by and calculating the density at each particle and then estimating
the gradient of the density so that we could apply a force in the opposite direction,
towards the rest density and update the velocity. Then using the new velocity we
apply vorticity and viscosity calculations, and do a final position update.
Finding neighboring particles and accessing their properties is the most intensive
step of this calculation as brute forcing would be way too slow with the amount
of particles that we want. To slightly improve memory access patterns we have separate
GPU buffers for each attribute of each particle (position, velocity, etc...). To
avoid brute force checking each particle on each stage of the algorithm we implement
the spatial hash grid using atomics as described in the
Cuda Particles Nvidia whitepaper.
We hash the particles position and use that to place it into a bucket, then
for each particle have a list of indices into each buffer. This optimization
allows for simulation of >200,000 particles in realtime on my home pc setup
(RTX 4070). I used basic full memory read/write barriers to synchronize access
between compute pipelines. I imagine the majority of the optimization left
would be in further optimizing memory access but I was very happy with the
effect that I managed to achieve with this minimal setup.