Overview
This was my final project for my computer animation course where we
were given the freedom to implement anything beyond or within the
scope of the class. Our professor, Dr. Sueda, after our PBD cloth
simulation assignment had linked a website with other applications
for PBD.
I had decided I wanted to do PBF sometime in the future anyway, so I
decided to use that for my final project. We had to make a project proposal
where I initially and suggested a slightly older paper but Dr. Sueda
directed me to
Position Based Fluids
by Miles Macklin and Matthias Muller. I liked his suggestion. He
made sure I was aware of some of the challenges that I'd face which weren't
within the scope of the class, (e.g. instanced rendering) and with his
approval I had my project decided.
Process
I used my WIP Vulkan Renderer because I already had some things done
like Image/Buffer abstractions and basic model loading. I knew I
wanted to implement it fully on the GPU and I'd want to isntance the
particles, so the first step I took was creating a GPU buffer and
uploading positions for all particles in a cube to make sure it
would work properly. I wanted to get the algorithm right on the GPU
first, so I could more easily identify where I'd need memory
barriers and since debugging shaders is just more difficult in
general.
I added gravity and constrained the particles between two points to just
keep them in a box. I implemented the PBF algorithm in the order it was
presented in the paper, first applying gravity and predicting positions,
creating lists of neighbors for each particle, and then executing the
solver algorithm. This first requried estimating the density by and calculating
the density at each particle and then estimating the gradient of the
density so that we could apply a force in the opposite direction, towards
the rest density and update the velocity. Then using the new velocity
we apply vorticity and viscosity calculations, and do a final position
update.
Finding neighboring particles and accessing their properties is the most
intensive step of this calculation as brute forcing would be way too
slow with the amount of particles that we want. To slightly improve memory
access patterns we have separate GPU buffers for each attribute of each
particle (position, velocity, etc...). To avoid brute force checking
each particle on each stage of the algorithm we implement the spatial
hash grid using atomics as described in the
Cuda Particles Nvidia whitepaper
. We hash the particles position and use that to place it into a
bucket, then for each particle have a list of indices into each buffer.
This optimization allows for simulation of >200,000 particles in realtime
on my home pc setup (RTX 4070). I used basic full memory read/write barriers
to synchronize access between compute pipelines. I imagine the majority
of the optimization left would be in further optimizing memory access
but I was very happy with the effect that I managed to achieve with this
minimal setup.