Yes, this appears to use Stam's Stable Fluids algorithm. Look for the phrases "semi-Lagrangian advection" and "pressure correction" to see the important functions. The 3d version seems to use trilinear interpolation, which is pretty diffusive.
This is a fine collection of links - much to learn! - but the connection between flow and gravitation is (in my understanding) limited to both being Green's function solutions of a Poisson problem. https://en.wikipedia.org/wiki/Green%27s_function
There are n-body methods for both (gravitation and Lagrangian vortex particle methods), and I find the similarities and differences of those algorithms quite interesting.
But the Fedi paper misses that key connection: they're simply describing a source/sink in potential flow, not some newly discovered link.
Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.
Yes, the author uses a globally-adaptive time stepper, which is only efficient for very small N. There are adaptive time step methods that are local, and those are used for large systems.
If you see bodies flung out after close passes, three solutions are available: reduce the time step, use a higher order time integrator, and (the most common method) add regularization. Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
>Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
IIRC that is what I did in the end. It is fudge, but it works.
I can't recommend cards, but you are absolutely correct about porting CUDA to HIP: there was (is?) a hipify program in rocm that does most of the work.
Here's one that starts with the concept of a straight line and builds all the way to string theory. It's a monumental book, and it still challenges me.
Roger Penrose's The Road To Reality.
Maybe never by the big players, but RDNA and even fp32 are perfectly fine for a number of CFD algorithms and uses; Stable Fluids-like algorithms and Lagrangian Vortex Particle Methods to name two.
CDNA executes 64-threads per compute unit per clock tick. RDNA only executes 32-threads. CDNA is smaller, more efficient, more parallel and much higher compute than RDNA.
Furthermore, all ROCm code from GCN (and older) was on Wave64, because historically AMD's architecture from 2010 through 2020 was Wave64. RDNA changed to Wave32 so that they can match NVidia and have slightly better latency characteristics (at the cost of bandwidth).
CDNA has more compute bandwidth and parallelism. RDNA is narrower, faster latency and less parallelism. Building a GPU out of 2048-bit compute (aka: 64-lanes x 32-bit wide/CDNA) is always going to be more bandwidth than 1024-bit compute (aka: 32-lanes x 32-bit wide) like RDNA.
I wasn't familiar with the "Wave32" term, but took "RDNA" to mean the smaller wavefront size. I've used both, and wave32 is still quite effective for CFD.
ROCm support for RDNA took like 2 years, maybe longer.
If you actually were using both, you'd know that CDNA was the only supported platform on ROCm for what felt like an eternity. That's because CDNA was designed to be as similar to GCN so that ROCm could support it easier.
--------
What I'm saying is that today, now that ROCm works on RDNA and CDNA, the two architectures can finally be unified into UDNA. And everyone should be happy with the state of code moving forward.
This has not been my experience in the academic/research side. Poison solver-based incompressible CFD regularly runs ~10x faster on equivalently-priced GPU systems, and has been doing so since I've been following it (since 2008). Some FFT-based solvers don't weak scale ideally, but that'd be even worse for CPU-based versions, as they use similar algorithms and would be spread over many more nodes.
reply