Pretty sure it has to do with benchmarks and the cut-throat competitive environm...

goodplay · on Jan 10, 2016

Apparently, AMD also partakes in benchmark-specific ''optimizations''[1]. Transparency is why many of us push for open source drivers.

http://www.cdrinfo.com/Sections/News/Details.aspx?NewsId=288...

pandaman · on Jan 10, 2016

Both AMD and NVidia drivers have special code paths for different applications. I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities.

To give an example, consider the difference between memcpy() and memmove(). On most systems memcpy() is as same as memmove() in the sense it works even when the source and destination overlap. Then you decide to optimize memcpy and to prevent bugs like this https://bugzilla.redhat.com/show_bug.cgi?id=638477 you will need to set a flag USE_MEMMOVE_INSTEAD_MEMCPY for every app that you know to memcpy between overlapped regions. You could call this "cheating" or could be a reasonable person and say something like this https://bugzilla.redhat.com/show_bug.cgi?id=638477#c129 instead.

As for the original question. I am not an expert on the windows driver model but have written some GPU drivers and can tell that a) memory release is asynchronous i.e. you cannot reuse the memory until the GPU finishes using it and b) clearing graphics memory from CPU over the PCIe is slow and drivers, in general, do not program GPU on their own. Taking these into account, it seems the driver is not well positioned to do this and this is a task for the OS instead.

oselhn · on Jan 10, 2016

"I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities." I think it is big problem. It is the same as forcing intel to change their CPU to workaround bugs in your application.

Fr0styMatt88 · on Jan 10, 2016

This analogy is pretty spot on, actually. It's a result of a long process of software evolution that went awry and this is a big reason why we need new APIs like Mantle, Vulkan and DirectX 12.

See this fascinating post:

http://www.gamedev.net/topic/666419-what-are-your-opinions-o...

pandaman · on Jan 10, 2016

You mean something like this https://en.wikipedia.org/wiki/A20_line ?

qb45 · on Jan 10, 2016

One could argue that address overflow above 1MB was not a bug, but a feature of the early real-mode CPUs and hence (ab)using it wasn't really a bug either.

Probably even Intel didn't anticipate protected mode with its 24 bit address bus when designing the 8086. 1MB was enough for everyone at this time.

pandaman · on Jan 10, 2016

Exactly my point. Intel was "forced" to fix a bug in software by changing its hardware. The A20 gate was not to prevent programs from accessing "HMA" it was to fix programs, which generated addresses above 0xfffff and expected it to wrap around.

chris_wot · on Jan 10, 2016

This probably won't happen, but it seems that games programmers are a large cause of problems for driver writers. Having to workaround bugs in games is bad for everyone.

Games Studios, IMO, should be made to fix their bugs themselves. They all have patching mechanisms these days, so it's not like it isn't impossible, or even unfeasible.

doikor · on Jan 10, 2016

Not having this much problems fixed in the API is being currently worked on with DX12 and Vulcan. The point being removing a huge bunch of the abstraction provided by dx/opengl and thus forcing the dev to write more sensible code.

Currently the engine developer in graphics programming writes something and in reality he has no way of knowing what actually happens on the hardware (the API is just too high level to able to really know much). From there it is the hardware providers job to take out their own debugging tools and make sure correct things happen by having a custom code path in the driver.

washadjeffmad · on Jan 10, 2016

It's a bit of the opposite, actually. There was a great article posted here (titled "Why I'm excited for Vulkan") where they explain how proprietary "tricks" GPU vendors use account for much of the necessity for game specific driver updates and optimizations. Game patches are to game bugs what driver updates (or "game profiles") are to what?

Lower level APIs like DX12 and Vulkan remove the competitive advantage vendor dependent performance creates, so well-coded games can perform consistently with lower overhead across ranges of hardware without having to rely on vendors to patch in the shortcuts through their drivers.

Currently, it's like filming a movie with IMAX specifications, then finding out that at different cinema chains it played with quality aberrations because their projectors didn't truly follow IMAX spec. The chains can fix it, but you're already getting blamed for the movie's issues. However, for a little money, on your next film they offer to work closely with you to ensure it shows the way you intended in their theaters. And no, they can't just tell you how to fix it-- their projection technology is a trade secret.

avereveard · on Jan 10, 2016

Eh given the constraints as you spell them out it seems a clear at the release moment driven by a shader could work.

pandaman · on Jan 10, 2016

This is probably because my explanation is very brief. I don't see how a shader (a program running on the GPU) can detect that the OS has killed a process and initiate a clear.

avereveard · on Jan 10, 2016

shader is a program executed by the gpu and can manipulate the memory, driver can create a fake surface out the freed memory and run the shader on it (which would avoid the need of zeroing the memory from the cpu trough the pcie)

pandaman · on Jan 10, 2016

Well, this is the whole point - how driver knows which memory is freed and how driver runs a shader by itself?

avereveard · on Jan 11, 2016

when you do a release on a texture object, when the context is destroyed, when the glDeleteTextures is called.. you just have to enumerate it all, but eventually all functions are passed to the graphic drivers to be translated into gpu operations.

pandaman · on Jan 11, 2016

It's as same as saying that a HDD driver can zero deleted files and delete temporary files when a process is killed because it translates API calls into HDD controller commands.

avereveard · on Jan 11, 2016

The comparison is apt because it would be exactly like the TRIM operation, retrofitted into the protocol for supporting drivers.

pandaman · on Jan 11, 2016

So, in your opinion the driver issues TRIM, not the OS? Then it would be possible to get it on, say, Vista with a driver update, would not it?

avereveard · on Jan 11, 2016

Yes, the idea is that the manufacturer is in the position of knowing the most efficient way to talk to it's gpu and the driver knows everything it's happening memory wise. It'd be interesting to have a prototype done in some open source linux driver. Tbh, I'm probably not good enough for that.

pandaman · on Jan 11, 2016

So why there is no TRIM support in Vista, or in Win7 for NVMe? Vista and Win7 use the same drivers, JFYI.

avereveard · on Jan 11, 2016

sorry missed 'So, in your opinion the driver issues TRIM, not the OS' from the previous reply. I never said that and that was not my point

my point was that an optional post delete cleanup feature was added to the protocol ready to be used, which is a perfect example on how to evolve long term features. then I said the post cleanup feature for the GPU should sit on the driver, since the GPU driver is the one knowing how to talk to the hardware, as there is not a shared protocol between boards (except vga modes etc but those contexts are memory mapped and os managed) and knows when a clear is performed, since all operations go trough it.

pandaman · on Jan 11, 2016

As I said, the driver does not know when to clear, same as HDD driver does not know when a file is deleted.

interpol_p · on Jan 10, 2016

I don't get it, though. How can the reason be due to benchmarks / performance seeking?

The driver simply has to zero the buffer when the new OpenGL / graphics context is established. It's once per application establishing a context, not per-frame (the application is responsible for per-frame buffer clearing and the associated costs). At worst this would lengthen the amount of time a GPU-using application takes to start up and open new viewports, but that hardly seems like it would matter or even register on any benchmarks.

c0n5pir4cy · on Jan 10, 2016

The thing is it probably not once per application. I'd imagine using multiple frame buffers in an application is actually quite common and could change quite often while an application is running; especially in complex applications like games. It's probably not enough of a hit to really justify not clearing the buffer but it's enough to make it noticeable in the benchmark race.