Hacker News new | past | comments | ask | show | jobs | submit login
Why is Windows so slow? (greggman.com)
337 points by kristianp on Dec 19, 2011 | hide | past | favorite | 153 comments



I'm disappointed HN! There is a lot of pontificating, but not much science here.

It takes all of 2 minutes to try this experiment yourself (plus ~8 minutes for the download).

1. Download chromium http://chromium-browser-source.commondatastorage.googleapis....

2. Unzip to a directory

3. Create this batch file in the src directory, I called mine "test.bat"

    echo start: %time% >> timing.txt
    dir /s > list.txt
    echo end: %time% >> timing
4. Run test.bat from a command prompt, twice.

Paste your output in this thread. Here is mine:

    start: 12:00:41.30 
    end: 12:00:41.94 
    start: 12:00:50.66 
    end: 12:00:51.31 
    
First pass: 640ms; Second pass: 650ms

I can't replicate the OP's claim of 40000ms directory seek, even though I have WORSE hardware. Would be interested in other people's results. Like I said, it only takes 2 minutes.


Wow, you are awesome for posting facts! I have often wondered if it was some systemic thing about the centrally-managed setups of Windows at Google that was making the Chrome build suck so much (such as some aggressive antivirus, as people often suggest in these situations). My recollection is that people run into the same slow build problems on non-Google-owned Windows computers, but perhaps that "slow" is a different slow than the OP's problem.

(For comparison, I have an older checkout on this Linux laptop. It's around 145ms to list, and it's 123k files. The original post mentions 350k files, which is a significantly different number. It makes me wonder if he's got something else busted, like "git gc" turned off, creating lots of git objects in the same directory producing some O(n^2) directory listing problem. But it could just as well be something like build outputs from multiple configurations.)


Unfortunate that this comment doesn't get enough attention in this thread... A single person's experience will always be subjective (even with the provided technical detail). If it's not consistently repeatable, it can only be used as an anectode.


I'm not sure what your point is. The post you're replying to is no less anecdotal. Obviously there are complicated performance interactions afoot here. In the OP's opinion (and mine) Windows is littered with these kinds of booby traps. Things usually work fast ... until they don't, and you have to dig hard to figure out why.

For the most part, Linux just doesn't do that. Obviously there are performance bugs, but they get discussed and squashed pretty quickly, and the process is transparent. On the occasions I'm really having trouble, I can generally just google to find a discussion on the kernel list about the problem. On windows, we're all just stuck in this opaque cloud of odd behavior.


If something is slow with respect to the OS, you can often break out procmon and examine the stacks of the poorly performing requests; or if something is stalled, you can do the same with procexp. With dbghelp configured correctly, you get symbol resolution in the OS libraries, so you can see what's going on. Worst comes to worst, you can step through the disassembly with a debugger, but it's not often required.

When I have problems with Linux, I tend to have to fall back to strace or an equivalent, and I find it harder to figure out what's going on. On Solaris, if I can find the right incantations to use with dtrace, I can see where the problems are, but it's easy to get information overload.

My point is, how opaque your perspective is depends on your familiarity with the system. I have less trouble diagnosing holdups on Windows than I do on other systems. That's because I've been doing it for a long time.


Macbook Pro (i7) with Vertex 3 SSD running Win7x64

start: 16:53:40.21 end: 16:53:53.18

start: 16:53:56.27 end: 16:54:09.31

(windows shows 283,871 files)


Try 'fsutil disablelastaccess 0'. Not sure if that's the case for the OP, but lastaccess is horrible on performance.


I think you mean 'fsutil behavior set disablelastacces 1' to disable last file access update.


Yes, I did - thank you for pointing that out.


Interestingly enough Joel Spolsky mentioned something related to the directory listing problem more than 10 years ago. See:

http://www.joelonsoftware.com/articles/fog0000000319.html

In Joel's opinion it is an algorithm problem. He thinks that there is an O(n^2) algorithm in there somewhere causing trouble. And since one does not notice the O(n^2) unless there are hundreds of files in a directory it has not been fixed.

I believe that is probably the problem with Windows in general. Perhaps there are a lot of bad algorithms hidden in the enormous and incredibly complex Windows code base and they are not getting fixed because Microsoft has not devoted resources to fixing them.

Linux on the other hand benefits from the "many eyes" phenomenon of open source and when anyone smart enough notices slowness in Linux they can simply look in the code and find and remove any obviously slow algorithms. I am not sure all open source software benefits from this but if any open source software does, it must certainly be Linux as it is one of the most widely used and discussed pieces of OS software.

Now this is total guesswork on my part but it seems the most logical conclusion. And by the way, I am dual booting Windows and Linux and keep noticing all kinds weird slowness in Windows. Windows keeps writing to disk all the time even though my 6 GB of RAM should be sufficient, while in Linux I barely hear the sound of the hard drive.


"Linux on the other hand benefits from the "many eyes" phenomenon of open source and when anyone smart enough notices slowness in Linux they can simply look in the code and find and remove any obviously slow algorithms."

More like Linux benefits from "many budgets and priorities". If someone at Microsoft spots an obviously slow algorithm, they may not be allowed to fix it, rather than working on whatever they're supposed to be working on, which probably doesn't include "fixing shipped, mature code that pretty much works in most cases."

On the Linux side, someone can decide it's really freakin' important to them to fix a slow bit, and there's little risk of it being a career-limiting move.


Along the same line, I've never understood Windows' propensity to swap out the kernel. I might have tons of free memory and half my kernel is swapped out. In fact, I've never figured out how to get it not swapped out, no matter how much memory I put in.


You could try this (from an elevated command-prompt), but the general consensus is that it has a small (if any) effect:

reg add "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x0 -t REG_DWORD -f

Disable it with: reg add "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f

More discussion here: http://serverfault.com/questions/12150/does-the-disablepagin...

The easiest way (if you have enough memory) is just disable the paging file.


Since we're talking about optimization, if you're on a SSD you should disable prefetching--which is essentially a technique for older (read: shittier) computers.


I don't think there's an O(n^2) algorithm in there. I just created a directory with 100,000 entries. Listing it (from Cygwin, no less, using 'time ls | wc') takes 185 milliseconds. The directory is on a plain-jane 7.2k 1TB drive, though of course it's hot in cache from having been created. 'dir > nul', mind you, is quite a bit slower, at over a second.


You only did one test, so you have no idea what the complexity curve is. Do at least three tests, with 1000, 10,000 and 100,000 entries and graph the results. Three tests is still pretty skimpy to figure out what the curve is, so do tests at 10 different sizes.

Also, Joel's complaint was about the Windows Explorer GUI (specifically, opening a large recycle bin takes hours). Cygwin `ls` is using a completely different code path. Your experiment does suggest that Joel's problem is in the GUI code, though, and not the NTFS filesystem code.


Oh, the OS treeview is dreadful, everyone who's seriously coded on Windows knows that.

As to actual complexity curve (which, knowing what I do about NTFS, I'm fairly sure is O(n log n)), I don't really care about it; since it hasn't shown up in a serious way at n=100000, it's unlikely to realistically affect anyone badly. Even if 1 million files (in a single directory!) took 18.5 seconds, it wouldn't be pathological. Other limits like disk bandwidth and FS cache size seem like they'd hit in sooner.


For all you know you're seeing the HDD cache and not any kind of filesystem caching. evmar mentions SSD making a difference for workloads that should fit in RAM, which means HDD caching also would, for a very modest workload.


I know it's not the HDD cache. I can monitor physical disk requests vs logical filesystem requests (diskmon vs procmon). But I repeated it anyway on my SSD, and the results are the same.


Which version of Windows? In my experience certain things will freeze up explorer XP/Server2k3 effectively indefinitely while on Vista/7/Server2k8 there is a progress meter and everything is stable.


Windows 7 x64. But I am talking about the command-line, not the UI. The UI controls (whether in Explorer or in other apps) generally don't react well to non-human sized input.


I think the recursive search is the thing that is broken.


I think the fact that is hot in cache may have influenced things here.


The OP specifically mentions doing it twice to make sure the cache is hot: "Do it twice and time how long it takes. The reason to time only the second run is to give the OS a chance to cache data in ram"


I just wish the "many eyes" phenomenon worked equally well for device drivers.


Unfortunately most of the drivers are in the long tail of unpopular devices. This results in greatly reducing the average number of eyes per driver.

This has a double effect: there are less developers who can attempt to fix problems with a driver and second, the incentive seems smaller - make general I/O 0.5% faster and you're a hero of the public, fix a critical problem in an unpopular device driver and maybe 1 person notices.


Exactly. I'm afraid this isn't a solvable problem. Device makers must be responsible for device drivers, open source or not.


Generally, they are. They get their Windows compatibility certification and everything.

Oh, wait, you mean for Linux? Good luck with that. The sad thing is that either providing drivers for Linux or providing detailed enough specs that drivers can be written by the OSS community offers little for no payback for the effort. As a Linux user, I feel lucky that any hardware manufacturers even acknowledge people might be running something other than Microsoft.

When you're in bed with Microsoft, you probably have even more reasons to blow off the OSS community. The sweaty chair-chucker wouldn't like it, and you don't want to get him angry. You wouldn't like him when he's angry.


That's what they used to say about IE vs the open source community. But somehow, programmers' nature finds a way...


The problem as I understand it is that Windows's file metadata cache is broken. I remember reading many years ago a posting by Linus about this but I can't find it at the moment.

According to this document (http://i-web.i.u-tokyo.ac.jp/edu/training/ss/lecture/new-doc...) it would appear that directory entries have one extra level of indirection and share space with the page cache and hence can be pathologically evicted if you read in a large number of files; compiling/reading lots of files for example.

On Linux however the directory entry cache is a separate entity and is less likely to be evicted under readahead memory pressure. Also it should be noted is that Linus has spent a largish amount of effort to make sure that the directory entry cache is fast. Linux's inode cache has similar resistance to page cache memory pressure. Obviously if you have real memory pressure from user pages then things will slow down considerably.

I suspect that if Windows implemented a similar system with file meta data cache that was separate from the rest of the page cache it would similarly speed up.

Edit: I should note, this probably wouldn't affect linking as much as it would affect git performance; git is heavily reliant on a speedy and reliable directory entry cache.


I don't know this poster, but I am pretty familiar with the problem he's encountering, as I am the person most responsible for the Chrome build for Linux.

I (and others) have put a lot of effort into making the Linux Chrome build fast. Some examples are multiple new implementations of the build system ( http://neugierig.org/software/chromium/notes/2011/02/ninja.h... ), experimentation with the gold linker (e.g. measuring and adjusting the still off-by-default thread flags https://groups.google.com/a/chromium.org/group/chromium-dev/... ) as well as digging into bugs in it, and other underdocumented things like 'thin' ar archives.

But it's also true that people who are more of Windows wizards than I am a Linux apprentice have worked on Chrome's Windows build. If you asked me the original question, I'd say the underlying problem is that on Windows all you have is what Microsoft gives you and you can't typically do better than that. For example, migrating the Chrome build off of Visual Studio would be a large undertaking, large enough that it's rarely considered. (Another way of phrasing this is it's the IDE problem: you get all of the IDE or you get nothing.)

When addressing the poor Windows performance people first bought SSDs, something that never even occurred to me ("your system has enough RAM that the kernel cache of the file system should be in memory anyway!"). But for whatever reason on the Linux side some Googlers saw it fit to rewrite the Linux linker to make it twice as fast (this effort predated Chrome), and all Linux developers now get to benefit from that. Perhaps the difference is that when people write awesome tools for Windows or Mac they try to sell them rather than give them away.

Including new versions of Visual Studio, for that matter. I know that Chrome (and Firefox) use older versions of the Visual Studio suite (for technical reasons I don't quite understand, though I know people on the Chrome side have talked with Microsoft about the problems we've had with newer versions), and perhaps newer versions are better in some of these metrics.

But with all of that said, as best as I can tell Windows really is just really slow for file system operations, which especially kills file-system-heavy operations like recursive directory listings and git, even when you turn off all the AV crap. I don't know why; every time I look deeply into Windows I get more afraid ( http://neugierig.org/software/chromium/notes/2011/08/windows... ).


When developing on Windows it pays significant dividends to manage your include files. There are a number of files provided as part of Visual Studio and/or the Windows SDK that bring in a tremendous amount of large other files.

Unlike with Linux it's quite difficult to perform piecemeal inclusion of system header files because of the years of accumulated dependencies that exist. If you want to use the OS APIs for opening files, or creating/using critical sections, or managing pipes, you will either find yourself forward declaring everything under the moon or including Windows.h which alone, even with symbols like VC_EXTRALEAN and WIN32_LEAN_AND_MEAN, will noticeably impact your build time.

DirectX header files are similarly massive too. Even header files that seem relatively benign (the Dinkumware STL <iterator> file that MS uses, for example) end up bringing in a ton of code. Try this -- create a file that contains only:

    #include <vector>
Preprocess it with GCC (g++ -E foo.cpp -o foo.i) and MSVC (cl -P foo.cpp) and compare the results -- the MSVC 2010 version is seven times the size of the GCC 4.6 (Ubuntu 11.10) version!


I came here to say this. Anytime Unix people complain about slow builds "on Windows," 9 times out of 10 it's because they're ignoring precompiled headers. The other 1 time out of 10 it's because they're using configure scripts under cygwin and finding out how expenseive cygwin's implementation of fork() can be.


You can build almost any Visual Studio project with out using visual studio at all. Visual Studio project files are also MSBuild files. I've setup lots of build machines sans Visual Studio, projects build just fine with out it.

MSBuild does suck in that there is little implicit parallelism, but you can hack around it. I have a feeling that the Windows build slowness probably comes from that lack of parallelism in msbuild.

As for directory listings it may help to turn off atime, and if it's a laptop enable write caching to main memory. I'm not quite sure why Windows file system calls are so slow, I do know that NTFS supports a lot of neat features that are lacking on ext file systems, like auditing.

As for the bug mentioned, it's perfectly simple to load the wrong version of libc on linux, or hook kernel calls the wrong way. People hook calls on Windows because the kernel is not modifiable, and has a strict ABI, it's a disadvantage if you want to modify the behavior of Win32 / Kernel functions, but a huge advantage if you want to write say, graphics drivers and have them work after a system update.

Microsoft doesn't recommend hooking Win32 calls for the exact reasons outlined in the bug, if you do it wrong you screw stuff up, on the other hand, rubyists seem to love the idea that you can change what a function does at anytime. I think they call it 'dynamic programming'. I can make lots of things crash on Linux by patching ld.config so that a malware version of libc is loaded. I'd hardly blame the design of Windows when malware has been installed.

Every OS/Kernel involves design trade offs, not every trade off will be productive given a specific use case.


I agree about your msbuild points, in fact I don't think it's that bad of a build system and just under-utilized (hidden, for the most part/most users, behind the shiny GUI tools).

Access times: According to the comments on that blog entry (and according to all search hits that I could find, see for example [1]) atime is already disabled by default on Windows 7, at least for new/clean installs.

1: http://superuser.com/questions/200126/are-there-any-negative...


Regarding MSBuild, the biggest problem I had with it is that if you built projects with Visual Studio, using most of the standard tooling for adding references and dependencies, you'd often be left with a project that built fine with Visual Studio, but had errors with MSBuild.

The reverse, incidentally, was usually okay. If you could build it with MSBuild, it usually worked in Visual Studio unless you used a lot of custom tasks to move files around.

I personally believe the fact that Visual Studio is all but required to build on Windows is one of the single most common reasons you don't see much OSS that is Windows friendly aside from those that are Java based.


> I personally believe the fact that Visual Studio is all but required to build on Windows is one of the single most common reasons you don't see much OSS that is Windows friendly aside from those that are Java based

You don't necessarily have to use VS to develop on windows. Mingw works quite well for a lot of cross-platform things and it is gcc and works with gnu make.

My experience with porting OSS between Windows and Linux (both ways) has been that very few developers take the time out to encapsulate OS specific routines in a way that allows easy(ier) porting. You end up having to #ifdef a bunch of stuff in order to avoid a full rewrite.

This is not a claim that porting is trivial. You do run in to subtle and not-so-subtle issues anyway. But careful design can help a lot. Then again this requires that you start out with portability in mind.


I like to make multi-platform code, and I do it with CMake, Boost, and Qt. My target platforms are Linux/g++ and Visual Studio (not mingw). It usually works OK after a little tweaking, but you have to maintain discipline on whichever system you're coding on, and not use nonportable pragmas etc.


I used to build wxWidgets and all my personally made Windows Software using mingw and gcc. Sadly, gcc is far, far slower than other compilers in windows (and once or twice I have found some compiler errors in gcc).

I also used Digital Mars for the same, but DMC sometimes fails with big builds.

I use Visual Studio now because I'm using DirectX and I just want something that works out of the box.


> MSBuild does suck in that there is little implicit parallelism

/m enables parallelism on the .NET side. Perhaps the same thing exists for the native compiler?


What is preventing you from using MinGW? That way, you could use the GNU toolchain (Make, GCC, Binutils etc.) and still have full access to the Win32 API. You could reuse almost all of your Unix build scripts, and the rest boils usually down to making your scripts aware of different file extensions (.exe/.dll instead of /.so).

Even better, you can do cross compiling with MinGW. So if your toolchain dosn't perform well on Windows, just use GCC as a cross compiler and build your stuff on a Linux or BSD machine. Then use Windows for testing the executable. (On smaller projects, you usually don't even need Windows for that, since Wine does the job as well.)

(Full disclosure: I'm the maintainer of a Free Software project that makes cross compiling via MinGW very handy: http://mingw-cross-env.nongnu.org/)


VC++ generates significantly better code than GCC. Enough so that performance-minded projects usually wouldn't consider MinGW/GCC for Windows code.


Does it matter during development though? You could always develop on GNU toolchain and then make a final build in VC once the feature code is complete.


I actually don't understand the original poster's slowness with Windows. We use Perforce and VC++ .slns, sometimes with apps split into DLLs and get none of the slowness the poster talks about. Actually we get significantly better performance with this than under Unix with GCC. No-change or small change rebuilds take a few seconds, with the time being dominated by the link and proportional to the link size.


Only if you are careful to only use the subset of the C++ spec supported by both compilers and avoid all gcc specific features.


In my experience, you don't have to be careful. I've written lot's of C++ that compile fine on Windows (using mingw/msys) or Linux/Mac using gcc. Can you provide an example of where gcc specific features are included w/o the developer explicitly doing so?


This is pretty common and not particularly difficult. Most large cross-platform C++ projects (ie most browsers and game engines) compile in both gcc and msvc. It is easy to naively write code in one compiler that won't build in another, but it's also easy to fix said code once you try to build it with another compiler.


I mentioned that in a context of application which is crossplatform already and when something is being developed, it affects (in most cases anyways) all platforms in the same way - so this limitation is "built-in" already.


Which might be a good choice for working on Chromium anyways, what with it being cross-platform :)

(Granted, there might be windows-specific stuff. I haven't checked the windows code)


They would be doing this anyway.


Please show me something to back this nonsense up.


How about you back up calling it nonsense?


We've done compiler tests between VC express 2010 and GCC (Mingw gcc 4.6.x branch) at work with GCC beating VC express at '-O3 -ffast-math -march=corei7' vs '/O2 /arch:SSE2' for our code on Intel Core i7 nehalem. GCC even beat ICC on certain tests on that same Intel hardware.

What we weren't able to compare between the compilers was link-time optimization and profile-guided optimization since Microsoft crippled VC express by removing these optimizations.

So when someone makes claims that 'VC++ generates significantly better code than GCC' I want to see something backing that up. Had I made a blanket statement that 'GCC generates significantly better code than VC++' someone would call me on backing up that aswell, and rightly so.


So when you didn't use the two most important perf features in MSC, its performance was underwhelming. This is no surprise.

Also, if you were doing anything heavily floating-point, MSC 2010 would be a bad choice because it doesn't vectorize. Even internally at Microsoft, we were using ICC for building some math stuff. The next release of MSC fixes this.


Well we obviously didn't enable PGO/LTO for GCC either when doing these tests as that would have been pointless.

It would have been interesting to compare the quality of the respective compiler's PGO/LTO optimizations (particularly PGO given that for GCC and ICC code is sometimes up to 20% faster with that optimization) but not interesting enough for us to purchase a Visual Studio licence.

And yes we use floating point math in most of our code, and if MSC doesn't vectorize then that would certainly explain worse performance. However this further denies the blanket statement of 'VC++ generates significantly better code than GCC.' which I was responding to.


I believe that at least one of the projects the original blog post mentioned - Chromium - can't be compiled with LTO or PGO enabled. Apparently the linker just runs out of memory with it and most large projects.


Well it makes sense that LTO would have high memory requirements given that the point of the optimization is to look at the entire program as one entity rather than on a file by file scope and I have no doubt this can cause problems with very large projects.

PGO on the other hand seems very unlikely to fail due to memory constraints, atleast I've never come across that happening, the resulting code for the profiling stage will of course be bigger since it contains profiling code but I doubt the compilation stage requires alot more memory even though it examines the generated profiling data when optimizing the final binary.

It seems weird that PGO would not work with Chromium given that it's used in Firefox (which is not exactly a small project) to give a very noticeable speed boost (remember the 'Firefox runs faster with windows Firefox binary under wine than the native Linux binary debacle'? That was back when linux Firefox builds didn't use PGO while the windows builds did.)


With respect, this is not how claims work. You can't make a claim and then expect your opponents to have the burden of proof to refute it.

If you make a claim such as 'GCC produces significantly worse code than alternate compiler A' then it's completely reasonable to ask for something to support it. Tone wise perhaps the post could have been improved, but the principle stands.


Chrome on Linux is awesome. At some point in the past year I stopped having to use command line options to keep it in memory as it defaults to that. Brilliant and slick, I love it. Chrome's performance and syncing is the reason I was able to transition almost entirely over to Linux from Mac this year with very little workflow disruption.


Perhaps the difference is that when people write awesome tools for Windows or Mac they try to sell them rather than give them away.

Well, if that was true, then you could just buy the better tool and use it, right? I suspect they don't exist because

a) On Linux, you have the existing linker to build your better one on. On Windows, you'd have to write your own from scratch making it less appealing for anyone (only people who'd buy it were those running huge projects like Chrome)

b) What you said about the file system itself just being plain slow.

PS: (Long time follower of evan_tech - nice to see you popup around here :) )


A large part of Windows slowness is NTFS, as you allude to in your last sentence. There are a myriad ways to slow the damn thing down, and none to make it significantly faster.

There's also the issue that it seems to slow down exponentially with directory size. In short, worst FS for building large software.

As for the OP's build time complaint about XCode - don't do that. Use the make build instead. Not only does XCode take forever to determine if it needs to build anything at all, it caches dependency computations. So if you do modifications outside of XCode, too, welcome to a world of pain :) (I know evmar knows, but for the OP: Use GYP_GENERATORS=make && gclient runhooks to generate makefiles for Chromium on OSX)


the make build didn't exist at the time I posted that AFAIK. The make build is oodles faster than the xcode build and I use it now of course but it's still the slowest platform to build chromium on by a long margin.


"Perhaps the difference is that when people write awesome tools for Windows or Mac they try to sell them rather than give them away."

Apple's switching from gcc to clang/llvm, and doing a lot of work on the latter, which is open source.


On Windows, SysInternals's RamMap is your friend. Also the System Internals book (#5 I think).

Every file on Windows keeps ~1024 bytes of info in the file cache. The more the files the more cache would be used.

Recent finding, that sped up our systems from 15->3sec on 300,000+ files filestamp check was to move from _stat to GetFileAttributesEx.

One would not think of doing such things, after all the C api is nice, open, access, bread, _stat are almost all there, but some of these functions do a lot of CPU intensive work (and one is not aware, until just a little bit of disassembly is done).

For example _stat does lots of divisions, dozen of kernel calls, strcmp, strchr, and few other things. If you have Visual Studio (or even Express) the CRT source code is included for one to see.

access() for example is relatively "fast", in terms that there is just mutex (I think, I'm on OSX right now), and then calling GetFileAttributes.

And back to RamMap - it's very useful in the sense that it shows you which files are in the cache, and what portion of them, also very useful that it can flush the cache, so one can run several tests, and optimize for hot-cache and cold-cache situations.

Few months ago, I came up with a scheme, borrowing idea from mozilla - where they would just pre-read certain DLLs that would eventually came up to be loaded (in a second thread).

I did the same for one our tools, the tool is single threaded, reads, operates, then writes. And it usually reads 10x more than it writes. So while the process operation was able to get multi-threaded through OpenMP, reading was not, so instead I had a list of what to read ahead, in a second thread, so that when it comes to the first thread, and it wanted to read, it was taking it from the cache. If the pre-fetcher was behind, it was skipping ahead. There was not even need to keep the contents in memory, just enough to read, and that's it.

For some other tool, where reading patterns cannot be found easily (deep-tree hierarchy) I've made something else instead - saving in a binary file what was read before for the given set of command-line arguments (filtering some). Later that was reused. It cut down on certain operations 25-50%.

One lesson I've learned though is to let the I/O do it's job from one thread... Unless everyone has some proper OS, with some proper drivers, with some proper HW, with....

Tim Bradshaws' widefinder, and widefinder2 competition had also good information. The guy that win it, has on his site some good analysis of multi-threaded I/O (can't find the site now),... But the analysis was basically that it's too erratic - sometimes you get speedups, but sometimes you get slowdowns (especially with writes).


Thanks for the GetFileAttributesEx tip -- https://github.com/martine/ninja/commit/93c78361e30e33f950ee...


Firefox uses VS2005 for builds in order to support XP pre-SP3.


tldr; no one at Google uses windows, a lot of time was spent optimizing the linux build, zero time was spent optimizing the windows builds.


NTFS is a slower file system, that's probably the main reason why. Also console I/O is much better on Linux than Windows.

Our software builds everyday on FreeBSD, Linux and Windows on servers that are identical.

The windows build takes 14 minutes. The FreeBSD and Linux build take 10 minutes (they run at almost identical speed).

Check out is more than twice slower on Windows (we use git).

Debug build time is comparable 5 minutes for Windows, 4 minutes 35 on Linux.

Release build time is almost 7 minutes on Windows and half that on Linux.

VS compiles more slowly than gcc but overall it's a better compiler. It handles static variables better and is not super demanding about typenames like gcc is. Also gcc is extremely demanding in terms of memory. gcc is a 64-bit executable, Visual Studio is still a 32-bit execuable. We hope Microsoft will fix that in Visual Studio 2011.

Its easier to parallelize gmake than Visual Studio, which also explains the better Linux build time. Visual Studio has got some weird "double level" mulithreading which is eventually less efficient than just running the make steps in parallel as you go through your make file.

However our tests run at comparable speed on Linux and Windows and the Windows builds the archive ten times faster than Linux.


There's a 64-bit version of the MSVC toolchain and MSBuild, so if you build outside of Visual Studio you won't be so constrained. This is how we do our builds here at work (a mix of C# and C++). We still edit code in VS, but local builds and continuous integration are done entirely using MSBuild. As of VS2010, C++ project files are MSBuild projects, and no longer need to use VCBuild.exe.


I don't think this is true, at least for the C++ compiler. There's a 32-bit version producing 32-bit code, a 32-bit version producing 64-bit code and a 64-bit version producing 64-bit code.

Disclaimer: I work at Microsoft, but this is my hazy recollection rather than some kind of informed statement.


The %VSROOT%\VC\bin\amd64 is indeed a native x64 binary.


Yes, but it can only produce x64 code; it can't produce x86 code. Mind you, I'd love to be wrong about that...


Sorry, I didn't read your entire comment. The 64-bit linker can produce 32-bit code with /MACHINE:X86.


I didn't know about the 64-bit toolchain, where can you get it?


It's an optional install package when you install Visual Studio.


We have it installed, do you have any reference about how to use it from a build process?


If you run the Visual Studio x64 tools command prompt from the start menu, it will set up the environment to have the 64-bit toolchain in your path.


Thanks, but we've been struggling with building using the amd64 chaintool from visual studio.


What is this archive step you mention in passing, and what makes Linux slow at it?


I don't know why Linux is so slow at it, it's when we build the tgz containing all the binaries before uploading it to the distribution server.


Possibly Linux isn't using the same compression settings?


I think it's more a caching thing, it's as if Windows isn't rereading the files from disk.


Someone posted the question on StackOverflow and it got closed as "not constructive". Is there a way to browse the "not constructive" questions on SO? They seem to be all the best ones.


Ref: http://stackoverflow.com/questions/6916011/how-do-i-get-wind... (By the author)

Top Answer:

Unless a hardcore windows systems hacker comes along, you're not going to get more than partisan comments (which I won't do) and speculation (which is what I'm going to try).

1. File system - You should try the same operations (including the dir) on the same filesystem. I came across this which benchmarks a few filesystems for various parameters.

2. Caching. I one tried to run a compilation on Linux on a ramdisk and found that it was slower than running it on disk thanks to the way the kernel takes care of caching. This is a solid selling point for Linux and might be the reason why the performance is so different.

3. Bad dependency specifications on windows. Maybe the chromium dependency specifications for Windows are not as correct as for Linux. This might result in unnecessary compilations when you make a small change. You might be able to validate this using the same compiler toolchain on Windows.


Why should he run the ls command on ntfs rather then a native file system? In all it was a "windows vs linux" test and not a fileystem test. Testing the same filesystem wouldn't make sense here


Presumably to find out whether the difference lies with the filesystem or somewhere else?

If Linux is still much faster, even with the same filesystem, you have eliminated one variable.


Doing some profiling and system/kernel level analysis would be much saner, imo. What's the sense in measuring how some non-native filesystem behaves? In the end you'll be benchmarking how good is fuse-ntfs vs. in-kernel-ext4 and figuring it's slower... I say, profile some code and see how much time is spent in filesystem calls.


But then you'd be comparing NTFS implementations. The codebases are completely diffrent.


http://data.stackexchange.com/

It's an online SQL tool for analyzing the stackoverflow data dump (last update was Sept 2011; they're quarterly http://blog.stackoverflow.com/category/cc-wiki-dump/). It's very cool, but curiously hard to find. There's a "[data-explorer]" tag at meta.stackoverflow: http://meta.stackoverflow.com/questions/tagged/data-explorer

The PostHistory table probably records why posts were closed.


And here's your answer: top closed SO questions! http://data.stackexchange.com/stackoverflow/s/2305/top-close... (link includes hyperlinks to all the questions)

  score id post
  1416 1711 What is the single most influential book every programmer should read?
  1409 9033 Hidden Features of C#?
  1181 101268 Hidden features of Python
  979 1995113 Strangest language feature
  736 500607 What are the lesser known but cool data structures?
  708 6163683 Cycles in family tree software
  671 315911 Git for beginners: The definitive practical guide
  653 662956 Most useful free .NET libraries?
  597 891643 Twitter image encoding challenge
  583 83073 Why not use tables for layout in HTML?
  579 2349378 New programming jargon you coined?
  549 621884 Database development mistakes made by application developers
  537 1218390 What is your most productive shortcut with Vim?
  532 309300 What makes PHP a good language?
  505 1133581 Is 23,148,855,308,184,500 a magic number, or sheer chance?
  488 114342 What are Code Smells? What is the best way to correct them?
  481 432922 Significant new inventions in computing since 1980
  479 3550556 I've found my software as cracked download on Internet, what to do?
  479 380819 Common programming mistakes for .NET developers to avoid?
  473 182630 jQuery Tips and Tricks
The query (also at http://data.stackexchange.com/stackoverflow/s/2305/top-close... ) - have a play.

  SELECT top 20
    p.score, p.id, p.id as [Post Link]
  FROM Posthistory h
    INNER JOIN PosthistoryTypes t ON h.posthistorytypeid = t.id
    INNER JOIN Posts p ON h.postid = p.id
  WHERE
    t.name = 'Post Closed'
  GROUP BY p.score, p.id
  ORDER BY p.score DESC


That's excellent. Thank you! As they say on Reddit, have an upboat! (Do they still say that on Reddit? No? Oh, never mind then.)


Interesting. I wonder how many "Linux-bashing threads" (that contained valid questions which made perfect sense) were closed as not constructive.


No need to invent a new conspiracy - closing and reopening happens regularly and for a lot of reasons/subjects.

Just - add a vote to reopen it, if you feel that the question is relevant. I just did.


Plus stackexchange has so many different subsites, the odds are any given question is off topic for where it was posted nowadays.


closed questions are still there. Only the deleted ones are gone ( but accessible to those with more than 10k reputation)


Well, this article seems to be asking "why is the Windows file system so slow?" and generally complaining. A better name for the article might be "Windows FS performance sucks" or similarly polemical title.

There are parts of Windows that are implemented just the same as Linux, and parts that are faster. Some parts are slower, notably the file system. But there's more to Windows than just the file system.

So, I'd say that's why it's not constructive.


Well that's the question isn't it? Is it just the filesystem, or is the speed of CL.EXE an issue too? What about the VS build system?

It sounds like you simply don't know enough about the issue to judge whether it's constructive.


1) Windows FS operations are slower than Linux in general but when you add 'Realtime' Antivirus on top it gets worse.

2) Linux forks significantly faster than anything else I know. For something like Chromium the compiler is forked bazillion times and so is the linker and nmake and so on so forth.

3) Linux, the kernel, is heavily optimized for building stuff as that's what the kernel developers do day in and day out - there are threads on LKML that I can't be bothered to dig out right now but lot of effort goes in to optimizing for kernel build workload - may be that helps.

3) Linker - stock one is slower and did not do the more costly optimizations until now so it might be faster because of doing lesser than the MS linker that does incremental linking, WPO and what not. Gold is even faster and I may be wrong but I don't think it does what the MS linker does either.

4) Layers - Don't know if Cygwin tools are involved but they add their own slowness.


I suspect it has something to do with NTFS updating access times by default. So every time you do anything with a file, it gets its access time updated (not modification time, access time). I don't have windows to test on, but you could try the suggestions [1][2] below.

[1] http://msdn.microsoft.com/en-us/library/ms940846(v=winembedd...

[2] http://oreilly.com/pub/a/windows/2005/02/08/NTFS_Hacks.html (#8)


The updating of last access time has been disabled since the release of Windows Vista (although you can turn it back on):

http://blogs.technet.com/b/filecab/archive/2006/11/07/disabl...


Mac OS X does appear to mount with atime but I happened to use this:

    $ cat /Library/LaunchDaemons/com.nullvision.noatime.plist    
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" 
            "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
        <dict>
            <key>Label</key>
            <string>com.nullvision.noatime</string>
            <key>ProgramArguments</key>
            <array>
                <string>mount</string>
                <string>-vuwo</string>
                <string>noatime</string>
                <string>/</string>
            </array>
            <key>RunAtLoad</key>
            <true/>
        </dict>
    </plist>
I don't know if there's anything like relatime.


As posted elsewhere and as someone pointed out in the very comments of the article: That doesn't seem to be the case for Windows 7 anymore.


But most Linux distros do this by default as well. You have to mount with noatime to get rid of it.


In fact nowadays most (all?) linux filesystems use relatime as default, which carries most of the advantages of both atime and noatime. See http://kerneltrap.org/node/14148

I don't know if something similar exists under windows (I suppose it doesn't).


Linux has defaulted to relatime since 2009.


The author doesn't mention whether he is using cygwin git, or msys git. msys is faster. But even with msys, UAC virtualization is a common cause of slowness with git: http://stackoverflow.com/questions/2835775/msysgit-bash-is-h...

More details here: http://code.google.com/p/msysgit/issues/detail?id=320


Sure, Cygwin does a bit of extra wrapping. The official site and just about anybody link to msysgit, however.


FWIW, try disabling your AV


I have found Windows a bit slower but never to the extent the author suggests. But then I have generally tested on clean systems without antivirus. So I suspect that this is a huge factor, esp. because it intercepts all FS calls and checks them. That's really not what you want when compiling code.


This.

And anything else you can find that registers with the kernel for file system calls. Logging mechanisms, etc. Anything the kernel has to talk to about file i/o is going to slow it down.


Probably a big reason for him seeing slowdowns in incremental builds with MSVC is because of link-time code generation. What seems to be link time is actually code generation time, and it's delayed because intra-procedural optimizations can be run. This kills off a lot of the benefit of incremental building - you're basically only saving parsing and type analysis - and redoing a lot of code generation work for every modification.

NTFS also fragments very badly when free space is fragmented. If you don't liberally use SetFilePointer / SetEndOfFile, it's very common to see large files created incrementally to have thousands, or tens of thousands, of fragments. Lookup (rather than listing) on massive directories can be fairly good though - btrees are used behind the scenes - presuming that the backing storage is not fragmented, again not a trivial assumption without continuously running a semi-decent defragmenter, like Diskeeper.


I'm not sure if it is related, but the fact is that file system operation on windows are much slower that on Linux. I remembering that copying large ISO image from one windows partition to another windows partition using Total Commander under Wine on Linux was faster that doing it directly on Windows.

I also remember that I was able to create file copy utility in assembly as a homework assignment that was couple times faster than windows/dos copy command.

The only two reasons I can think of that explain this are: 1 - noone cares about windows fileystem performance. 2 - someone decided that it shouldn't be too fast.


In Total Commander you can configure the buffer sizes used while copying. Maybe in your homework you chose the right buffer size too (and, of course, asm is fast, but hard to write, I'm sure you didn't bother too much with error checking and other "small" problems).

Moreover, the optimal buffer size is different for small and large files, maybe Windows is not optimized for large size like a DVD image.


As for Total commander example, that was out of the box experience, without any tweaks. I just wanted to point out that even when using emulation layer to access Windows native filesystem type, Linux was significantly faster on file system operations.

As for my homework "copy" command, I know that it is not fully replacement to file copy windows command, but if copy operations takes >10min, all those checks and additional tasks shouldn't make IO bound operation take couple time longer than what some student implemented as homework.


Here is a link from the comments:

NTFS Performance Hacks - http://oreilly.com/pub/a/windows/2005/02/08/NTFS_Hacks.html


Not sure about the other things, but this

The default cluster size on NTFS volumes is 4K, which is fine if your files are typically small and generally remain the same size. But if your files are generally much larger or tend to grow over time as applications modify them, try increasing the cluster size on your drives to 16K or even 32K to compensate. That will reduce the amount of space you are wasting on your drives and will allow files to open slightly faster.

is wrong. When you increase cluster size you will definitely not "reduce the amount of space you are wasting". 100B file will still occupy the whole 16KB ( so you will waste 15.9KB on it instead of 3.9KB with 4KB clusters.

Also I would be very careful with taking advice like that from an article which is 6 years old (before introduction of Win7 or XP SP3!)


That's not the point he's trying to make. Smaller cluster sizes leads to larger amounts of file-system metadata keeping track of where those clusters are laid out on disk, as well as the overhead of generating, accessing, and updating those data structures.


NTFS uses run lists (comparable to extends in ext4) so you would only have more metadata if your drive gets fragmented. (Which I hope for the sake of comparability is not the case here)


If files are typically bigger than 16KB, 16KB clusters can potentially save space vs 4KB clusters, by needing smaller cluster indices. Not sure if that's the case for NTFS though.


How is a 100 byte file a valid refutation of what you quoted? It quite clearly states that you should leave the cluster size alone if your files are typically small.


well, OK let's take a guess at median filesize 4KB [1]. Then the 4KB cluster wastes some space only for half of the files whereas 16 KB wastes space for bigger proportion plus they waste additional 12KB for 50% of the files.

On the other hand the paper also shows how ridiculous it is to talk about the wasted space - look at figure 14, files smaller than 16KB occupy ~1% of all space. Even if we waste space 4:1 it's still ridiculously small amount of space

[1] http://www.usenix.org/events/fast11/tech/full_papers/Meyer.p...


  "But if your files are generally much larger or tend to grow over time..."
No numbers are given in what you quoted, but when I think about "much larger" I'm thinking of file sizes in the megabyte to gigabyte range, such as a dedicated data drive for a media server or a database server. A 4KB median file size doesn't fit my mental model of "much larger".


I haven't used Windows as my primary OS for almost a decade and haven't even booted it on bare metal for the past couple years, so my guesses must be taken with a grain of salt. I always found directory fragmentation (much more than file fragmentation) to be a huge performance problem on Windows. If increasing cluster size effectively reduces directory fragmentation, your life under Windows will be much better.


Git under cygwin is so painfully slow, and gets exponentially slower as number of tracked files grow. Even SSD can't fully smooth out the difference :(


What I can't wrap my head around is the amazingly slow file search (I'm using Vista). Searching for a filename I know exists in a small directory (say, 100 files) often leads to Windows searching for several minutes and then NOT FINDING THE FILE. How can that happen when Windows is able to list the contents of the directory (including the file I'm looking for) instantly?


I am also frustrated by the indexed searching in Windows. This should have been a brilliant signature feature, and it's just execrable.

If you look at the indexing options in the Vista control panel and click the "Advanced" button, you'll find a dialog box with a "File Types" tab. This horrible dialog may show you (it did for me) that some file types (i.e., filename extensions) are deliberately excluded from indexing. For some reason. You know, because you may not want to find certain things when you look for them. I guess.

You'll also find the world's worst interface for specifying what kinds of file should be indexed by content. But never mind.

If searching by filename and/or path is all you're after, check out Everything:

http://www.voidtools.com/

If you're not using Windows as an Administrator, (and you shouldn't be) Everything won't seem very polished. But it is terrifyingly fast, and it's baffling that Microsoft's built-in search is this bad if something like Everything is possible.


Maybe this has something to do with the file indexer? 2 years ago I heard a lot of XP users complain that Windows was suddenly getting very slow. After some digging around I noticed that they turned on the file indexer by default after an update. Since then I always turn it of (properties of your disk) and shut down the service (Indexing Service).


Don't forget forking.

To benchmark the maximum shell script performance (in terms of calls to other tools per second), try this micro-benchmark:

    while true; do date; done | uniq -c
Unix shells under Windows (with cygwin, etc.) run about 25 times slower than OS X.


Well, Unix shells under Windows don't even implement forking, so this test is kinda meaningless isn't it?

It's like racing cars where one of the cars has its wheels taken off.


Cygwin has fork. The problem is that Cygwin's fork has to copy the entire address space into the new process, whereas Linux uses copy-on-write to make forking much faster (you only need to copy the page tables).


A possible faster way to read directories with one (okay, few at most kernel calls) is to use GetFileInformationByHandleEx.

Here is some example:

https://gist.github.com/1487388


separate point:

while photoshop isn't on linux, there are plenty of replacements for that unless he's doing print work, which I don't think is the case, as photoshop isn't the beginning and end for print. (actually, TBH, photoshop is pretty shit for pixel work.)

Also maya is available for linux, autodesk just doesn't offer a free trial like they do with windows/mac os. (Including the 2012 edition.)

With no offence intended to the 3dsmax crew, as it has it's merits, but a sufficiently competent maya user won't find much use for 3dsmax.


Hard drive swap file usage, my windows machines always have a huge swap file going, my Linux and OS X machines almost never do.


I'm pretty sure the "dir" takes longer than "ls" because "dir /s c:\list.txt" sorts the entire c:\ drive before looking for "list.txt". "ls -R c:\list.txt" first checks if "list.txt" exists, and fails if it doesn't. Just take out the "list.txt" and run both commands again.


This is a comparison of file systems, not operating systems.


I run Windows 7 in Boot Camp every day and it easily outperforms OS X on the same exact hardware for most common tasks (browsing files, the web, starting up apps, etc).

The Windows desktop GUI system is more stable than anything else out there (meaning that it's not going to change drastically AND that it's a solid piece of software that just works) and it's as flexible as I need it to be, so that's why I stick with Windows. With virtual machines, WinSCP, Cygwin and other similar utilities, I have all the access to *nix that I need.


> The Windows desktop GUI system is more stable than anything else out there (meaning that it's not going to change drastically AND that it's a solid piece of software that just works)

So you assume that Windows8 metro-mode won't really catch on? Also comparing OSX to Windows over the past 10 years, it's Windows that has changed more drastically, so both future and past evidence to the contrary...


How has Windows changed more drastically than OS X since 2001? To me drastic means that something very basic has changed and/or compatibility has been lost. (Things like resizable windows, full screen apps, broken finder plugin compatibility, changes to expose, addition of mission control, etc.)

I don't assume, know or care to know anything about Windows Metro. It's not replacing the desktop system that I use.


Wow, so if you want to test file system speeds, you do it by listing files - I know this just an example, but perhaps it has something to do with the speed of the terminal?

There are a plethora of disk benchmarking tools - I doubt that they consistently show 40x differences.

Hooves -> horses, and all that.


Maybe I'm misunderstanding (or he's since changed his post), but:

    dir /s > c:\list.txt
is piping it into a file. Where does the speed of the terminal affect that (in any significant fashion)? I know what you're getting at - tar --verbose can slow things down for me by sometimes a factor of 2 (for huge tarballs), but I don't think it's an issue in this situation.


Yes, I saw the redirection. But "dir" is an built-in command in the Windows shell; is the speed of that command a benchmark-able number? Is the point to compare the speed of "ls" vs "dir", or the underlying OS/file-systems (i.e., posix vs. win32 /Ext3 vs. NTFS)? If someone tells me that "dir" is slow, I'd agree -- but that -- in itself -- doesn't imply that the filesystem is slow.


True, I agree it's not necessarily a good way to test the filesystem, but it's only the shell that's being hit, nothing really to do with the terminal.

I pointed it out mainly because terminals can have a significant impact on performance, because dumping millions of lines a second isn't their intended purpose,[1] whilst the shell can be reasonably expected to do that.

Having it entirely as a shell built-in possibly actually better than the equivalent '/bin/ls > somefile' since it doesn't need to context switch back and forth as the stdout buffer fills up and the shell has to write it.

[1] I recall there being a Gentoo-related thread about why "Gentoo-TV" -- having the output of gcc scroll past as your background with a transparent xterm -- was actually slowing down package builds significantly.


If it's unreasonable to assume a 30-year old command extending directly back to MS-DOS 1.0 has never had anybody once take a look at making sure it isn't unreasonably slow, well, I think there's some reasonable conclusions we can draw from that with regards to performance on Microsoft products.


Huh? He redirected the output to a file, so terminal speed doesn't really factor in here.


It might be, actually, since dir is a command built into cmd.exe on Windows, not a separate .exe that gets run. It may be smarter to use "ls" from cygwin for the test.


There is indeed marked differences between unix terminals - xterm is blazingly fast. Do the same test the author did with xterm and with another terminal and you'll see significant differences. cmd may just be a slow terminal masking actual windows speed.


I will never, ever understand the downmodding on this site. There is no particular opinion in this statement - it's an informative statement that the speed of your terminal can matter. I find it strange that on a site claiming to be for hackers that a fairly opinion-free comment on terminal speed differences gets downmodded.

It's hard to keep a thick skin about having your voice diminished when even your informative, unopinionated stuff gets shut down.


"Why is Windows so slow? I’m a fan of Windows, specifically Windows 7."

????


Overall, the author's argument is somewhat dependent on a premise that Windows 7 should be optimized for edge cases such as compiling code written for multiplatform implementation (e.g. Chrome) rather than using the managed code model around which Microsoft's development of Windows has been centered for many years.

If one were optimizing Windows performance, none of the specific areas used as examples would receive much attention given user demographics. What percentage of Windows users use the command line, much less compile C programs, never mind using "cmd" shells to do so?

Windows command line gurus will be using Powershell these days, not the legacy encumbered "cmd" - elsewise they are not gurus.


Hmm.. I've had an opposite experience on an atom netbook. Tried Windows Vista and Ubuntu on a netbook. The Windows netbook worked great while Ubuntu regularly would crash. Ubuntu on the netbook was unusable. Now probably I did something wrong? But just installed the latest version with default settings. Anyway I returned the netbook.


The author has an issue with speed, not stability.


Were you running compiles on it??


No I was running a web browser and a chat program.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: