this kind of sims are better suited for CPU ;-), gpu are good to work on meshes, not really on pure particles. At GPU are super good for grid based hydro.
"gpu are good to work on meshes, not really on pure particles"
Why?
Having thousands of particles, all in need of doing the same operations on them in parallel screams GPU to me.
It is just way harder to program a GPU, vs a CPU.
Collision detection is usually a tree search, and this is a very branching workload. Meaning that by the time you reach the lowest nodes of the tree, your lanes will have diverged significantly and your parallelism will be reduced quite a bit. It would still be faster than CPU, but not enough to justify the added complexity. And the fact remains that you usually want the GPU free for your nice graphics. This is why in most AAA games, physics is CPU-only.
It uses the very simple approach, of testing every particle with EVERY other particle. Still very performant (the simulation, the choosen rendering with canvas is very slow)
I currently try to do something like this, but optimised. With the naive approache here and Pixi instead of canvas, I get to 20000 particles 120 fps on an old laptop. I am curious how far I get with an optimized version. But yes, the danger is in calculating and rendering blocking each other. So I have to use the CPU in a smart way, to limit the data being pushed to the GPU. And while I prepare the data on the CPU, the GPU can do the graphic rendering. Like I said, it is way harder to do right this way. When the simulation behaves weird, debugging is pain.
If you use WebGPU, for your acceleration structure, try to use the algorithm here presented in the Diligent Engine repo. This will allow you not to transfer data back and forth between CPU and GPU: https://github.com/DiligentGraphics/DiligentSamples/tree/mas...
Another reason I did it on CPU was because with WebGL you lack certain things like atomics and groupshared memory, which you now have with WGPU. For the Diligent Engine spatial hashing, atomics is required. I'm mainly using WebGL because of compatibility. iOS Safari still doesn't enable WGPU without special feature flags that user has to enable.
Thanks a lot, that is very interesting! I will check it out in detail.
But currently I will likely proceed with my approach where I do transfer data back and forth between CPU and GPU, so I can make use of the CPU to do all kinds of things. But my initial idea was also to keep it all on the GPU, I will see what works best.
And yes, I also would not recommend WebGPU currently for anything that needs to deploy soon to a wide audience. My project is intended as a long term experiment, so I can live with the limitations for now.
This is a 2D simulation with only self-collisions, and not collisions against external geometry. The author suggests a simulation time of 16ms for 14000 particles. State of the art physics engines can do several times more, on the CPU, in 3D, while colliding with complex geometry with hundreds of thousands of triangles. I understand this code is not optimized, but I'd say the workload is not really comparable enough to talk about the benefits of CPU vs GPU for this task.
The O(n^2) approach, I fear, cannot really scale to much beyond this number, and as soon as you introduce optimizations that make it less than O(n^2), you've introduced tree search or spatial caching that makes your single "core" (WG) per particle diverge.
"that make it less than O(n^2), you've introduced tree search or spatial caching that makes your single "core" (WG) per particle diverge"
Well, like I said, I try to use the CPU side to help with all that. So every particle on the GPU checks maybe the 20 particles around it for collision (and other reactions) and not 14000, like it is currently.
That should give a different result.
Once done with this sideproject, I will post my results here. Maybe you are right and it will not work out, but I think a found a working compromise.
Yeah, pretty much this, I've experimented with putting on the GPU a bit but I would say particle based is 3x faster than a multithreaded & SIMD CPU implementation. Not 100x like you will see in Nvidia marketing materials, and on mobile, which this demo does run on, GPU often becomes weaker than CPU. Wasm SIMD only has 4 wide but the standard is 8 or 16 wide on most CPUs today.
But yeah, once you need to do graphics on top, that 3x pretty much goes away and is just additional frametime. I think they should work together. On my desktop stuff, I also have things like adaptive resolution and sparse grids to more fully take advantage of things that the CPU can do that are harder on GPU.
The Wasm demo is still in its early stages. The particles are just simple points. I could definitely use the GPU a bit more to do lighting and shading a smooth liquid surface.
Agree with most of the comment, just to point out (I could be misremembering) 4-wide SIMD ops that are close together often get pipelined "perfectly" onto the same vector unit that would be doing 8- or 16-wide SIMD, so the difference is often not as much as one would expect. (Still a speedup, though!)
The issue is not really parallelism of computation. The issue is locality.
Usually a hydro solver need to solve 2 very different problem short and long range interaction. therefore you "split" the problem into a particle-mesh (long range) and a particle to particle (short range).
In this case there is no long range interaction (aka gravity, electrodynamics), therefore you would go for a pure p2p implementation.
Then in a p2p, if you have very strong coupling between particles that will insure the fact that neighbors stay neighbors (that will be the case with solids, or with very high viscosity). But in most case you will need are rebalancing of the tree (and therefor the memory layout) every time steps. This rebalancing can in fact dominate the execution time as usually the computation on a given particle represent just a few (order 100) flop. Then this rebalancing is usually faster to be done on CPU than on GPU. Then evidently, you can do this rebalancing "well" on gpu, but the effort to have a proper implementation will be huge ;-).
UPDATE: I've added a "hacker mode" for you all! You can now specify a userUpdate function and it will run it each frame. See my twitter post for a demo of it.
https://x.com/kotsoft/status/1806362956294189299
You actually can adjust the settings for this. In settings>simulation, instead of sameRestDensity being 8 make it 0 and make it higher for diffRestDensity. I recommend doing it with low gravity as well (you can get zero g by clicking enable accelerometer on computers without accel)
Wasn't this shared a week/few weeks ago? Not that I mind it being posted again, just that if it was already posted it might be worth linking the old thread, in case there was some interesting discussion there.
Yes, the previous one is: https://news.ycombinator.com/item?id=40429878
There are some new features since then and some major speed improvements from using SIMD.
I do still see complaints about the compressibility so I still need to work on some improvements for that.
(dev)
This is interesting. I wish the other interactions were as dramatic as the drag one. You have to really crank up the brush size to see anything interesting.
This is still the same kind of simulation, based on the Particle-based Viscoelastic Fluid Simulation paper. I updated it to use Wasm SIMD more fully with the help of Clang Vector Extensions, Compiler Explorer and Wasm Analyzer. Compiler explorer to play around with patterns and Wasm Analyzer to double check the final compilation.
Really awesome work. The WASM lets you show this off to a massively larger audience than a regular binary. I'd encourage you to make your Voxel project also using WASM, then you can easily turn it into a mobile/tablet game.
Thanks! I am definitely working on bringing more to the WASM world. I'd begun experimenting with 3D and multithreading and then last week decided to circle back to the 2d demo and polish it up a bit more.
Yeah agree. The objective was more to make a physics toy that would run on single core on a phone than something for actual scientific or industrial use. I could add additional iterations or do pressure projection but then there would be complaints about it being slow & choppy.
There are also some large density ratios between the materials which further increased the difficulty, and would also increase the number of pressure projection iterations on a grid. I tried to simulate buoyancy without cheating (e.g. giving different materials different acceleration to gravity)
Thanks! With 3D the main challenge might be visualizing the layers separately. Ideally each of the phases would have some kind of metaball effect and also be transparent and even refract. Will be pretty tough to do and will have to fight with uncanny valley effect.
Sometimes for sandbox I feel like 2D can be more fun because people can target particles they interact with better.
tl;dr this is a liquid simulation toy. This starts with four colors of different densities (so they self-organize into layers). Clicking allows scooping up a ball of the liquid. I consider this pretty to watch, like a laval lamp that I can also throw globs of.
You also have a settings UI, where you can change (among many other things) what clicking does. I find it most interesting to switch to "repel" - the repulsion forcefield turns to be an universal tool for mixing and separating the different liquids, depending on how fast and precisely you operate it. Fascinating.
Hi, I've updated the home page on my site (https://grantkot.com) with links to my other socials, like the YouTube and itchio pages. Twitter for more casual frequent updates, and YouTube for longer summary updates. The itchio demos need to be optimized for a wider variety of machines.