Overview
I initialize 100,000,000 random positions and iterate on them with
the Lorenz attractor. They're rendered as points and colored by
their world space velocity, leading to the pretty pattern shown. The
particles are so dense at this volume that the attractor appears
solid until the camera gets almost inside path formed by the
attractor.
First initialize the particle data on cpu/host buffers then copy that
data into gpu/device buffers. I create two position buffers so that the
compute for the next frame can run while the rendering of the current
frame is going on in a classic ping-pong style buffer situation. On frame
N I write to buffer A. Start rendering while reading from buffer A. Then
without blocking I start compute for frame N+1 writing to buffer B, also
reading from buffer A. At particle counts as high in the example video
it doesn't matter so much since the bottleneck is the compute time, but
for lower particle counts and having multiple particle systems concurrently,
overlapping the work with the previous frame's rendering can be a valuable
optimization.
This setup can easily be extended to a more practical particle system
by instancing a billboarded quad with a sampled textures rather than
colored points, but the points are suffificent to show the concept and
looked good the visuzliation that I already wanted to make. The rendering
work for billboarded quads and texture fetches would be much slower than
simple shader currently used resulting in less particles but more quality
per particle. The current setup for calculating the next position is
a single compute shader which bases input purely on the previous position,
which is always available to read since it is necessary for the render.