More

hrydgard · on May 8, 2023

You can have a Vulkan driver that only exposes a video decoder queue and no raster/compute/copy queues, not a problem.

jchw · on May 8, 2023

ahh, that makes sense. Very nice. Thanks for clarifying.

hrydgard · on April 30, 2023

valgrind runs on your unmodified binary, it's asan that needs a separate build.

mgaunard · on April 30, 2023

The build you'd use for development, to run unit tests and validate merge requests, is the as-instrumented-as-possible build.

hrydgard · on Feb 2, 2023

What dropoff? The newest number is on top.

(Though yeah, there will be a dropoff for sure, it's just not visible here).

ad404b8a372f2b9 · on Feb 2, 2023

Ha, I should really learn to read.

usrusr · on Feb 2, 2023

There has been a discount.

hrydgard · on Aug 22, 2022

There is no good reason to flip the flag dynamically at runtime and apps just don't do that, so flushing the pipeline should be perfectly fine, even in an implementation of the clip control extension.

hrydgard · on May 24, 2022

The Witcher 3 scratches that itch fairly well to tide you over, if you haven't played it, although it is a bit different.

hrydgard · on April 12, 2022

This is old, cute, but simply slow and a really bad idea on modern hardware.

mi_lk · on April 12, 2022

> slow and a really bad idea on modern hardware.

say more?

bruce343434 · on April 12, 2022

care to elaborate? It's just 3 instructions.

jleahy · on April 12, 2022

On a modern x86 cpu the ‘xchg’ instruction performs a swap and can do so entirely in the front-end via register renaming. It doesn’t even require a micro-op. This ‘trick’ actually requires executing micro-ops and creates an unnecessary data dependency (which is actually worse than the micro-ops themselves).

Better to just use more variables and let the register allocator in the compiler decide what to do. If it’s a loop then unrolling it once could remove the need for any swapping at all, for example.

zwegner · on April 12, 2022

> On a modern x86 cpu the ‘xchg’ instruction performs a swap and can do so entirely in the front-end via register renaming. It doesn’t even require a micro-op.

This is only true for AMD cpus, on Intel xchg is 3 uops. Still better than the xor trick, though.

Source: https://www.uops.info/html-instr/XCHG_R64_R64.html

marginalia_nu · on April 12, 2022

What if you want to swap two large memory blocks? Should be doable in a maximally cache-fridendly way with SIMD XORs I think.

tomn · on April 12, 2022

this isn't going to be any better than just loading from the two buffers into registers then storing the other way around, like:

  a = load(ap)
  b = load(bp)
  store(ap, b)
  store(bp, a)
  ap += step; bp += step;

any instructions to "do the swap" are a waste because generally load-store are separate instructions in SIMD instruction sets (and even if that wasn't the case, that's how they would get executed anyway)

if you want to avoid polluting the cache there are SSE instructions for loading without caching, which might be worthwhile

edit: this might be useful in a SIMD context where you need to swap two registers, where the cost of using another register is higher than the cost of the 3 arithmetic instructions. i could totally imagine that happening, but it's nothing to do with caches or memory

chmod775 · on April 12, 2022

> What if you want to swap two large memory blocks?

In theory, maybe.

But if that happens in your application and is performance critical, you probably should change it such that you're swapping pointers to them instead...

marginalia_nu · on April 12, 2022

There are very real cases where you may want to swap around actual memory though. Gabage collectors do quite a lot of this type of large memory exhanges.

animal531 · on April 12, 2022

I'd also hazard that throwing in an if statement in place of a temporary variable is a bad idea.

hrydgard · on Feb 3, 2022

If you generate the rays the linear way instead, you don't even need any correction.

Generate two points which represent the left and right edges of the screen - you'd put them in at say 45 degrees left and right of the forward vector of the player. Then to generate the direction vector for each column of the screen, just interpolate linearly between those two points, and find the vector from the player to that intermediate point.

dormando · on Feb 3, 2022

The venerable lodev tutorial uses this method, which I also used for most of my engines. I learned an interesting tidbit while comparing the two methods though:

The old-school original methods used pretty small cos/sin/atan lookup tables to do the ray and then the correction calc. Using the linear method you end up with a couple divisions per ray that aren't there in the lookup method. Divisions were (and are, depending on the platform) pretty slow. Linear method still works with lookup tables but they're relatively huge.

Also IIRC With the linear method door-indents need a workaround.

hrydgard · on Jan 22, 2022

The animation speed isn't the same for some reason but otherwise it's exactly like back in the day.

hrydgard · on Dec 20, 2021

The PSX CPU did not even have floating point support. It's nowhere close to a Pentium 90.

flatiron · on Dec 20, 2021

It’s a video game system. You can make do without floating point. The whole “it doesn’t have fp so whobbly textures” is a myth.

monocasa · on Dec 20, 2021

It did have a fixed point vector coprocessor instead though.

hrydgard · on Nov 4, 2021

Consider ICL instead of LASIK if it's an alternative for the level of vision you have. Super happy, and the procedure was really no big deal.