More

markstock · 2025-09-01T02:21:57 1756693317

Yes, this appears to use Stam's Stable Fluids algorithm. Look for the phrases "semi-Lagrangian advection" and "pressure correction" to see the important functions. The 3d version seems to use trilinear interpolation, which is pretty diffusive.

markstock · 2025-06-26T04:51:24 1750913484

Um, no?

This is a fine collection of links - much to learn! - but the connection between flow and gravitation is (in my understanding) limited to both being Green's function solutions of a Poisson problem. https://en.wikipedia.org/wiki/Green%27s_function

There are n-body methods for both (gravitation and Lagrangian vortex particle methods), and I find the similarities and differences of those algorithms quite interesting.

But the Fedi paper misses that key connection: they're simply describing a source/sink in potential flow, not some newly discovered link.

markstock · 2025-05-13T13:49:55 1747144195

Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.

markstock · 2025-05-13T13:22:01 1747142521

Yes, the author uses a globally-adaptive time stepper, which is only efficient for very small N. There are adaptive time step methods that are local, and those are used for large systems.

If you see bodies flung out after close passes, three solutions are available: reduce the time step, use a higher order time integrator, and (the most common method) add regularization. Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.

hermitcrab · 2025-05-13T18:33:38 1747161218

>Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.

IIRC that is what I did in the end. It is fudge, but it works.

markstock · 2025-05-13T19:20:16 1747164016

It is a fudge if you really are trying to simulate true point masses. Mathematically, it's solving for the force between fuzzy blobs of mass.

mkoubaa · 2025-05-13T20:01:45 1747166505

You are never simulating pure anything. All computational models are wrong. Some are useful

markstock · 2025-04-02T04:05:16 1743566716

I can't recommend cards, but you are absolutely correct about porting CUDA to HIP: there was (is?) a hipify program in rocm that does most of the work.

markstock · 2025-04-01T13:42:35 1743514955

The US Treasury has one, though. Not sure if that satisfies the above criteria.

markstock · 2025-03-14T13:06:05 1741957565

Here's one that starts with the concept of a straight line and builds all the way to string theory. It's a monumental book, and it still challenges me. Roger Penrose's The Road To Reality.

markstock · 2025-03-03T01:26:23 1740965183

If you love this aesthetic and the concepts beneath it, I highly recommend Paolo Soleri's Arcology: The City in the Image of Man.

markstock · 2025-01-20T16:38:24 1737391104

Maybe never by the big players, but RDNA and even fp32 are perfectly fine for a number of CFD algorithms and uses; Stable Fluids-like algorithms and Lagrangian Vortex Particle Methods to name two.

dragontamer · 2025-01-20T16:53:41 1737392021

I'm talking about Wave64.

CDNA executes 64-threads per compute unit per clock tick. RDNA only executes 32-threads. CDNA is smaller, more efficient, more parallel and much higher compute than RDNA.

Furthermore, all ROCm code from GCN (and older) was on Wave64, because historically AMD's architecture from 2010 through 2020 was Wave64. RDNA changed to Wave32 so that they can match NVidia and have slightly better latency characteristics (at the cost of bandwidth).

CDNA has more compute bandwidth and parallelism. RDNA is narrower, faster latency and less parallelism. Building a GPU out of 2048-bit compute (aka: 64-lanes x 32-bit wide/CDNA) is always going to be more bandwidth than 1024-bit compute (aka: 32-lanes x 32-bit wide) like RDNA.

markstock · 2025-01-20T19:53:44 1737402824

I wasn't familiar with the "Wave32" term, but took "RDNA" to mean the smaller wavefront size. I've used both, and wave32 is still quite effective for CFD.

dragontamer · 2025-01-20T20:15:57 1737404157

ROCm support for RDNA took like 2 years, maybe longer.

If you actually were using both, you'd know that CDNA was the only supported platform on ROCm for what felt like an eternity. That's because CDNA was designed to be as similar to GCN so that ROCm could support it easier.

--------

What I'm saying is that today, now that ROCm works on RDNA and CDNA, the two architectures can finally be unified into UDNA. And everyone should be happy with the state of code moving forward.

markstock · 2025-01-20T16:32:31 1737390751

This has not been my experience in the academic/research side. Poison solver-based incompressible CFD regularly runs ~10x faster on equivalently-priced GPU systems, and has been doing so since I've been following it (since 2008). Some FFT-based solvers don't weak scale ideally, but that'd be even worse for CPU-based versions, as they use similar algorithms and would be spread over many more nodes.