Hacker News new | past | comments | ask | show | jobs | submit login
Making Your Game Go Fast by Asking Windows Nicely (anthropicstudios.com)
154 points by zdw on Jan 17, 2022 | hide | past | favorite | 71 comments



missing vblank due to power management

Ugh. Few years ago I've built a gaming rig with i5-8400 and GTX1080 (both chosen for known workloads). Some games ran fine, but some were jerky af, and frametime monitor was zigzag-y all over the place. I thought that maybe 8400 was not a best choice despite my research and bought i7-8700 only to see that situation got much worse. After days of googling and discussions I found the issue: mobo bios had C1E state enabled. In short, it allows to drop the CPU frequency and voltage significantly when it's idling, but this technique isn't ready to operate 100+ times per second. After drawing a frame, CPU basically did nothing for a period of time (<10ms), which was enough to drop to C1E, but it can't get out of it quickly for some reason. And of course 8700 was much better at sucking at it, since it had more free time to fall asleep.

I understand that power saving is useful in general, but man, when Direct3D sees every other frame skipped, maybe it's time to turn the damn thing off for a while. Idk how a regular consumer could deal with it. You basically spend a little fortune on a rig, which then stutters worse than an average IGP because of some stupid misconfiguration.


> but it can't get out of it quickly for some reason

As overclockers are aware, to achieve higher frequencies but keep CPU stable, one gonna need higher CPU voltage. Works other way too, lowing frequency allows to lower voltage, and that’s what mostly delivers the power saving from these low-power states.

These chips can’t adjust voltage instantly because wires inside them are rather thin, and there’s non-trivial capacity everywhere. This means CPUs can drop frequency instantly, then decrease the voltage over time. However, if they raise frequency instantly without first raising the voltage, the chip will glitch.

That’s AFAIK the main reason why increasing the clock frequency takes time. The chips first raise the voltage which takes time because capacity, and only then they can raise the frequency which is instant.


There's apps that disable C states and core parking, and boost P states.

I use QuickCPU and max everything out. Yes it sounds like a sham but works wonders.

https://coderbag.com/product/quickcpu


I simply disabled C1E in BIOS, because it's a desktop. But still had to use EmptyStandbyList technique afterwards, which helps with the rest of the issue (tested with and without for few days, it really works).


I also recommend scheduling an ESL task.


What does ESL mean in this context? Google isn't particularly helpful on this one.


EmptyStandbyList. I fell into the same trap as you and tried searching 'ESL Task', but it's in the post they were responding to.


I strongly agree with you, and it's just that I have i7, but all of the experiences are quite similar.


As someone who contributed to a (formerly) OpenGL-based video player[1], these issues with waiting for vblank and frame time variability on Windows are depressingly familiar. Dropping even one frame is unacceptable in a video player, but we seemed to drop them unavoidably. We fought a losing battle with frame timings in OpenGL for years, which eventually ended by just porting the renderer to Vulkan and Direct3D 11.

One thing that we noticed was that wakeups after wglSwapBuffers were just more jittery than wakeups after D3D9/D3D11 Present() with the same software on the same system. In windowed mode, this could be mitigated by blocking on DwmFlush() instead of wglSwapBuffers (it seems like GLFW does this too, but only in Vista and 7.)

The developer might also get some mileage from using ANGLE (a GLES 3.1 implementation on top of D3D11) or Microsoft's new GLon12.

[1]: https://mpv.io/


I used to work at a high-performance scientific computing company. In the mid-2000s they ran into a weird issue where performance would crater on customer PCs running windows, unless that PC were currently running Windows Media Player. Something to do with process scheduling priority. Don’t know whether this was a widely-disseminated old hand trick of the era or anything.


Probably timeBeginPeriod WinAPI called by that media player: https://docs.microsoft.com/en-us/windows/win32/api/timeapi/n...


Google Chrome had exactly that effect and at least in the past running Google Chrome made some software to function correctly. (Although perhaps there's also some software timeBeginPeriod(1) affects negatively.)

Doesn't help when your testers run Google Chrome all the time...


It is astonishing to me that someone would want to use Windows for something HPC related. I'm not generally a Windows hater (actually I am, but I see that there are legitimate business reasons to use it), but the HPC ecosystem seems much more Linux-friendly.


It is, but there are a lot of applications that people like to use on Windows PCs (think CAD or data analysis stuff) that have computationally-intensive subroutines. In that company’s case it was GPU-accelerated electromagnetic wave simulations, seismic imaging reconstruction, and CT scan reconstruction. The company developed these libraries and licensed them for use in larger CAD or data analysis software packages.


The windows team works closely with hardware makers to support new and upcoming specialized hardware. This enables hardware makers to focus on hardware bring up and not worry about OS support.

There are many technologies that work first or better until a certain time on Windows. For example (not necessarily HPC related) SMR drive support.


I definitely agree that if I had to get some random device working, Windows is probably a good first OS to try. But since Linux has such a large supercomputer/cluster/cloud presence, the situation is sort of flipped for HPC. At least as far as I've seen -- most numerical codes seem to target the Unix-verse first, and the only weird drivers you need are the GPU drivers (actually I haven't tried much GPGPU out, but I believe the Linux NVIDIA GPGPU drivers aren't the same horrorshow that their desktop counterparts are).


Speaking for the time I used to be at CERN, while the cluster is fully UNIX based, there is a big crowd of researchers running visualisation software, and other research related work on Windows e.g. Matlab/Tableau/Excel, and nowadays I assume macOS as well (it was early days for it 20 years ago).


I was thinking more of the number crunching bits, rather than visualization, since the original issue was around performance. But I guess visualization can be computationally cruncy too.


Does Linux have any HDR support? I looked into recently and the answer seemed to be a firm “no”.


HPC=high performance compute. I'm not sure that HDR would come up there. Maybe you are thinking of HTPC (home theater personal computer)?

I actually was just looking this up, quite randomly. I think it is actually a soft no, in the sense that support is essentially non-existent now, but it is being worked on.

It appears to be an annoying situation where HDCP (high definition content protect, I think) content has some sort of encryption baked in that requires proprietary driver/hardware support. Since the main use case for HDR is watching movies, and we can't do that without the proprietary stuff, there's not a ton of incremental value for the community to get by working on it. But Intel, Nvidia, AMD, and Red Hat are all working on it.

https://www.phoronix.com/scan.php?page=news_item&px=Red-Hat-...

Also this recent version of libreelec

https://libreelec.tv/2021/11/03/libreelec-matrix-10-0-1/

advertises that they can process (decode?) 4k, 10 bit video, on a raspberry pi 4, but they apparently only 8 bit output is actually available. Not sure if this is a hardware or software limitation.

Anyway, the pieces are maybe coming together, just very slowly.



> by linking with PowrProf.dll, and then calling this function from powersetting.h as follows

> This function is part of User32.lib, and is defined in winuser.h which is included in Windows.h.

This is one reason I think Windows is such a mess of an OS. (Look at the contents of C:\Windows and tell me it's not, if you can do so with a straight face!)

To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.

That combined with COM, and the registry, and I don't want to touch it with a ten-foot pole.


This isn't especially different from Linux's dependency on /lib/ld.so. There's a design choice to not have syscalls and instead make you go through the libraries, to discourage people making themselves dependent on undocumented syscalls. Of course, there probably shouldn't be undocumented syscalls in the first place, since that's a bit suspicious.

> combined with COM, and the registry

And yet GNOME has dconf and CORBA, because in order to do certain things you converge on the same solutions.

(Now, if you want a mess, the attempts to retrofit secure containers onto this with UWP definitely count!)


> To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.

“Ought to be a system call” is a matter of opinion. Among OSes, Linux is an outlier in that it keeps its system call interface stable.

Many other OSes choose to provide a library with a stable interface through which system calls can (and, in some cases must. See https://lwn.net/Articles/806776/; discussed in https://news.ycombinator.com/item?id=21859612) be called. That allows them to change the system call ABI, for example to retire calls that have been superseded by other ones.

(ideally, IMO, that library should not be the C library. There should be two libraries, a “Kernel interface library” and a “C library”. That’s a different subject, though)


You can also see performance improvements in processes that do I/O by having a low priority process running that does nothing but run an infinite loop. This keeps the computer from switching to idle CPU states during the I/O. This was on Linux, there is probably an OS setting to accomplish the same thing, but it was pretty counter-intuitive.


> This was on Linux, there is probably an OS setting to accomplish the same thing, but it was pretty counter-intuitive.

On x86 processors, you can achieve this at the kernel level by adding `idle=poll` to the kernel command line.


PowerSetActiveScheme sets the system power plan, it's not something a game should be doing without telling the user first.


I've had games do this and found it annoying since I like my PC to run in balanced mode. Not so much to save power but to let the machine idle when I'm not using it. Found I could work around it by deleting the power plans other than balanced.

I've never played OP's game, so evidently a few games are out there doing this.



Translation: "We didn't provide the necessary API support, so now we're going to whine about ad-hoc brute force solutions that developers would never have had to resort to if we'd done our jobs."

Why isn't there a function I can call that enforces full CPU power, but only while my application is running? I never wanted to change global system-level settings, but if that's the only affordance provided by the Win32 API, then so be it.


Because you're generally not supposed to overwrite the users performance settings temporarily either?


It would be nice if it were that simple. Unfortunately, power settings under Windows are incredibly (and unnecessarily) complex, and I doubt that one in twenty users even knows the options are available. Worse, the Windows power settings tend to revert magically to "energy saving" mode under various unspecified conditions. This phenomenon almost cost me an expensive session at an EMC test lab once, when data acquisition on the device under test repeatedly timed out due to CPU starvation.

It's entirely reasonable for performance-critical applications (not just games!) to be able to request maximum available performance from the hardware without resorting to stupid tricks like the measures described in this story, launching threads that do nothing but run endless loops, and so forth.

I do agree with those who point out that this should be a user-controlled option. On the application side, this could be as simple as a checkbox labeled "Enable maximum performance while running" or something similar. Ideally, the OS would then switch back to the system-level performance setting when the application terminates, rather than leaving it up to the application to do the right thing and restore it explicitly.


Sometimes those are the user’s performance settings, but more often the user has no idea what these performance settings you speak of are and they just don’t want to see your game stutter. It would be nice to be able to distinguish these cases, and this user would love if games could temporarily disable aggressive power saving automatically when I’m running a game and put it back the rest of the time.


Alternative translation: "Our documentation is weak and our engineering teams aren't held accountable for it, so we're blaming third-party developers instead of doing our jobs".


Perhaps if the user has a slow old CPU we could also order them a new one on Amazon for use only while the game is running too...


Or perhaps when i set "Disable power saving" for USB devices Windows would actually do this. It's bad to be in a Teams meeting and from time to time your USB Bluetooth adapter to be disconnected.


That's a good point--I'll look into whether Microsoft has any guidelines on this, and add a disclaimer to the article when I get a chance.


Interestingly, Garage Band on my G5 kicks power management to highest performance without asking, though it turns it back down when it quits. Guess Apple didn't have a problem with it.


It probably also won't reset the setting if the game crashes.


Yeah, this feels like really bad UX.


> As of April 5th 2017 with the release of Windows 10 Version 1703, SetProcessDpiAwarenessContext used above is the replacement for SetProcessDpiAwareness, which in turn was a replacement for SetProcessDPIAware. Love the clear naming scheme.

This is the kind of thing I hate about "New Windows". Once upon a time MS used to strive for backward compatibility. These days every few years there's a new function you need to call. You can't get optimal behavior just by writing good code from the start. You need to do that, and also call the YesIKnowHowPixelsWork api call, and set <yesIAmCompetent>true</yesIAmCompetent> in your manifest to get what should be the default behavior. It's a mess.


This is precisely the "Old Windows" way of doing things where there are legacy APIs still supported for that forever backwards compatibility and current APIs exist for ways you probably want to do things in a new app.

For reference SetProcessDPIAware solidified over 15 years ago whereas 15 years prior to that there wasn't even a taskbar so of course it's going to be out of date from a UI API perspective but that's what's needed if you want to also support apps from 15 years ago well.


Specific example from the good old days: EnableTraceEx2 "supersedes the EnableTrace and EnableTraceEx functions.". https://docs.microsoft.com/en-us/windows/win32/api/evntrace/...

Func -> FuncEx -> FuncExN was a common pattern. (Which I like more than Func -> Funcness -> FuncnessContext, despite the lack of creativity!) Another one was tagging structures with their own length as the first member variable, so if a later SDK creates a newer version of the struct, the callee can tell the difference. eg https://docs.microsoft.com/en-us/windows/win32/seccrypto/cry...


The reason it's so complex is because of backwards compatibility. Non-DPI aware applications from before DPI settings were a thing can't advertise that they're not DPI aware, so if an application doesn't announce which it is, Windows has to assume that it's not aware. A couple years ago, Microsoft was able to make changes to the GDI libraries to automatically adjust the size of elements its rendering which makes a lot of things sharper. But things like images or anything on screen not rendered by GDI will not magically become sharp.


     ASSERT(SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MONITOR_AWARE_V2));
If ASSERT is a no-op in release mode then you're only getting your setting set here while in debug mode


It's not, in my codebase, but I'll edit that when I have the chance so nobody blindly copy pastes it and ends up with something super broken


About switchable graphics, nVidia APIs do work. The problem with them, there's no API to switch to the faster GPU, they only have APIs to setup a profile for an application, ask for the faster GPU in that profile, and the changes will be applied next time the app launches.

I had to do that couple times for Direct3D 11 or 12 apps with frontend written in WPF. Microsoft doesn't support exporting DWORD variables from .NET executables.

Technical info there: https://stackoverflow.com/a/40915100


It's possible I'm misunderstanding the docs, but here's the line that lead me to believe linking to one of their libraries alone would be enough (and lead to my surprise when it didn't work):

(https://docs.nvidia.com/gameworks/content/technologies/deskt...)

> For any application without an existing application profile, there is a set of libraries which, when statically linked to a given application executable, will direct the Optimus driver to render the application using High Performance Graphics. As of Release 302, the current list of libraries are vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, and cudart..


Can it be that you linked to one of these libraries, but never called any function from that DLL, so your linker dropped the unused DLL dependency?

However, I don't really like that method. The app will fail to launch on computers without nVidia drivers, complaining about the missing DLL. For languages like C++ or Rust, the exported DWORD variable is the best way to go. The only reason I bothered with custom installer actions, that method wasn't available.


Hmm. I think I tried calling into their API to rule that out--but, it's been a while, so it's 100% possible I remember incorrectly which would explain why it didn't work!


> This isn’t often relevant for games, but, if you need to check how much things would have been scaled if you weren’t DPI aware, you can call GetDpiForWindow and divide the result by 96.

If you aren't scaling up text and UI elements based on the DPI then it doesn't really sound like your application is truly DPI aware to me. I don't see why that applies any differently to games versus any other kind of application.


I think it's reasonably common for games to scale their text and UI elements by the overall screen or window size, in which case opting out of clever OS DPI tricks is the right choice. Using actual DPI doesn't make much sense in general - the player could be sitting right in front of their laptop screen or feet away from a big TV, which obviously require very diffent font sizes in real world units.


Yup, you hit the nail on the head (author of the article here). I guess I could've clarified that, I didn't expect people to assume I was advocating against scaling your UIs to fit the user's screen! Many games scale to fit the window by default, and even offer additional controls on top of that.


It's not as simple as just scaling the UI to the size of the screen though, because the UI elements should be bigger at the same screen size if the scaling is higher. That's why, like you mention in the article, you'll be able to tell when the setting has been changed simply by looking at the scale of the UI: because it will be wrongly too small once the setting is activated.


Yup! I'm aware of what DPI scale is for, I use it when I write game tools. I don't use it in game, though--that's an intentional tradeoff I'm making. It seems like a pretty common tradeoff for games though!

If you want to see why, try mocking up a typical shooter HUD. Now try scaling up/down all the elements by 50% and see what happens. Feel free to play with the anchoring, etc. Chances are you're not gonna like what you see! Things get even more complicated when you consider that players with controllers often change their view distance when gaming and don't wanna reconfigure their display all the time.

The typical solution is to fix the UI scale to the window size, and keep the text large enough that it's readable at a large range of DPIs and viewing distances. If you can't get 100% there that way you'll typically add an in-game UI scale option. (The key difference between that and the built in UI scaling in Windows being that it's specific to the game, so you'll set it to something milder than you'd set the Windows option, and it will only affect the game so you don't have to keep changing it back and forth.)

[EDIT] I think I came up with a way to explain this that saves you the trouble of drawing it out yourself. The fundamental issue, view distance changes aside, is that games are balancing a third variable most apps don't have to: how much of the background--the actual game--is the UI occluding?


If you’re scaling based on percentage of overall window dimension, the elements will be the same physical size on two monitors of the same physical size even if one is 1080p and one is 2160p or what have you. It won’t cause you to draw elements with a fixed physical pixel size and draw tiny letters like some accidentally DPI aware Windows applications do.

It is non-ideal for something like a traditional UI element where users may want to alter scaling settings to have more screen space if their monitor is close enough to their eyes, but for a game HUD that usually isn’t the desired effect anyway.


But in those cases you'd expect the user to manually adjust their scaling settings, which wouldn't be respected if following the author's advice here.


Unless the game engine is doing its own scaling, this does sound like lying to the operating system to get out of the way of those pesky user-friendly features to get more frames.

I think Microsoft made it this hard to enable the DPI-aware setting exactly because it forces developers to think about things like DPI. If everyone follows this guide and ignores it, then I predict that in a few years this setting will be ignored as well and a new DPI-awareness API will be released.


If you’re DPI aware you should never size your elements by physical pixels, but many game UI elements like a HUD or very simple menus scale better using “percentage of screen dimension” or similar heuristics. Like using fixed device independent pixel sizes this system won’t cause the elements to look tiny on a HiDPI display, and it will generally do a better job on large screens that are usually far away from the user (e.g. TVs).


Games should either be aware of the user’s preferred scaling or at least offer their own UI scaling option. But they should always register as DPI aware so they don’t render the 3D scene at a lower resolution than what’s selected


Two comments that kinda go against the flow:

1. Please add options to conserve battery too. A FPS limiter would be good. Messing with the system power management when the user doesn't want to be tethered to a wall plug is Not Nice(tm).

2. When you do UI scaling, especially if you're young with 20/20 eyesight, please allow scaling beyond what you think is big enough.


The other reply mentioned that vsync can save battery--on top of vsync, Way of Rhea supports syncing to every other vblank, halving the FPS. This will presumably should save even more battery (though the intended use case is to prevent stuttering on computers that can't consistently hit the monitor's refresh rate.) Ultimately, though, no matter what I do I don't think you're gonna be able to play very long on battery power.

For #2--my glasses prescription is so strong that I spent $330 on fancy lenses today in the hopes that they'll distort less around the edges. Wish me luck. (:


1. FPS limit needs to be an integer multiple of the display refresh rate, or you introduce stuttering. That’s why missing a vsync results in exactly 30fps on a 60hz monitor, and that isn’t a good user experience.

Using Vsync is actually conservative power wise. You can always smash the gpu by rendering frames as fast as possible. This a one way of reducing input latency in FPS shooters, using vsync guarantees one frame of latency on a fast GPU.

2. Agreed 100%. All too frequently the UI is too small on high dpi monitors (see CIV6), or unreadable when viewed from a distance when playing on a TV (see Witcher 3)


> That’s why missing a vsync results in exactly 30fps on a 60hz monitor, and that isn’t a good user experience.

It depends on the game. What does Civ need 60 fps for? Not sure if they have a limiter because I'm not playing 6 due to some of their design decisions that tick me off.

On the other hand, I do keep my Minecraft limited to 30 fps.


Pardon my ignorance (I'm not a game developer).

I was surprised to find vertical blanking interval mentioned in the article as CRTs haven't been a common sight for years. Is it still a relevant concept when writing code for modern GPUs?


HDMI still has both vertical and horizontal "blanking", although efforts are under way to reduce it: https://blogs.synopsys.com/vip-central/2021/06/24/reduced-bl...

In any case, you want to know when the frame has been scanned out so you can swap the buffer to the next frame.


I suspect it's just the name that stuck for the double/triple buffer for signal synch'ing and avoid tearing.


To the video hardware, LCDs are raster displays, and receive information in line scans and frames over time just like CRTs, even if it's a digital signal. You can have V-sync on or off (complete with screen tearing) just the same.


It's not just laptops that have switchable graphics - I have a desktop with a GPU but I use graphics output on the motherboard.


Is it possible to employ any of those API calls in Java? How would the equivalents look here?


I don't have a full answer, but if it helps at all, these are all C APIs--so if you can find a way to call C code from your Java program you should be set.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: