Coverity has been reporting bugs it finds in the Linux kernel since ~2000.[1]
That makes this comparison complete nonsense, surely? "Bugs found by static analyser X" is only useful as a metric for comparing software projects insofar as it's representative of wider code quality. Which may well be true normally, but doesn't work if you report those bugs, then do the analysis again after they're fixed to compare with the results from software projects you didn't do that with!
Even worse, not all lines of code are equal. Across languages, but also within them. C/C++ can be terse or verbose depending on style and the subset of features employed.
I'm a big fan of Coverity but it takes a lot of babying to make it useful. The code bases where I've brought Coverity bugs down to an acceptable ratio included a lot of markups and comments like
"// doing it like this to make Coverity happy"
The other issue is that the high-quality closed source codebases are probably inaccessible precisely because the amount of investment it takes to get the defect count low is also the reason they are closed.
Ok, so my problem with this is what qualifies as a "bug" in their scans. If there scans are so good at finding these bugs then we need to pay them all tons of money and make our software bug free by using their scanning tool.
My experience with their free open source scan is that of the 50 or so "defects" found, one was an outright bug, one was a false-positive, one was intentional (a deliberate crash in some debug code), and the rest were basically style complaints.
I've often wondered what the defect rate is for various projects as measured by hours spent. I've often seen commercial code written quite quickly, while open source code, being written more for self-actualization, artistry, and social proof, tends to be written more slowly (and carefully). On the other hand, good engineers tend to write code more quickly than bad ones.
Linux! Ha! I am sure the device driver for that 256mb Hello Kitty USB thumb-drive is poetry. Linux is not the first place I'd look for high-quality code. I think I'd start with Minix.
True. And I wonder if Minix is higher. But from a security point of view, you're only as strong as your weakest link. Linux runs device drivers in kernel space, so your OS is as strong as the hello-kitty driver. Think this is hard to exploit? You can create a USB device to search a linux host for device drivers that are exploitable, then imitate that device. [1]
Suddenly, the Hello Kitty USB drive matters. That code is running in kernel space.
Minix on the other hand runs device drivers in user-land. [2]
Given that device drivers contain 3-7 times as many bugs as other kernel code,[3] a conclusion you may reach is that Linux contains more bugs per line than Minix.
Interesting, though I guess there is a non-trivial performance cost to userspace drivers, seems hard to believe you could reasonably drive a GPU from userland. I remember John Carmack saying something about how driver overhead was the one of the biggest bottlenecks when developing modern games.
Driver quality is of course something which will always significantly rock the boat when it comes to stability but that is going to be the same with any operating system. To an extent driver quality should be a factor when choosing hardware. If you don't build your kernel with Hello Kitty support you never have to worry about that code.
I guess that is one of the reasons that Apple has a better reputation for software reliability in that for the most part they get to choose the hardware that will be used with the OS.
Most GPU drivers on Linux are largely userspace based. The kernel bit has a verifier for the command buffer generated by userland to ensure it's not making the GPU read out of memory that that process shouldn't.
On embedded there's less verification going on, but the drivers are still almost all in userland.
I guess that would make sense, since USB peripherals are likely less performance sensitive in terms of latency and are also the place where you are going to get the widest variety of devices.
Yeah, it always bugs me, people bitch and moan about "getting into mainline" and having to "satisfy every use case of Linux", then they claim Linux is shitty code. They don't seem to connect that discriminating in what patches are accepted is no guarantee of quality, but it is a pre-requisite.
If by benchmark of quality you mean code with the lowest defect density ("Defect density refers to the number of defects per 1000 lines of software code.").
It's by no means the definition of "code quality", but others parameters (readability, efficiency, encapsulation, etc.) are in good part subjective and very hard to translate into numbers, I think.
Benchmark of open source quality perhaps. Over everything I'd say NASA would probably take that prize. Of course we don't have their sources to analyze but their practices are well known and the results seem pretty strong.
Nonsense. I work long hours debugging various kernels, including Windows, Linux and the *BSDs. The quality of the OpenBSD kernel is amazing. Maybe Minix is better, but that's because it's educational code.
It's a relatively sophisticated static analyser. Nothing new, but quite useful.
Open source projects can register and get reports for free, commercial companies have to pay. Coverity uses eg. Linux to test and compare their product against, and write various marketing pieces such as this one to raise awareness for their product.
Defect is too broad a term in this study. The headline is sensationalization of one metric, which is it itself too broad. And does it really measure what it states?
I've often felt that more developers (or even interested power users), should be running with things like MALLOC_CHECK_=3 (http://www.novell.com/support/kb/doc.php?id=3113982) enabled by default for everything. On top of that, when we have plenty of FLOSS static analysis tools (https://news.ycombinator.com/item?id=4545188), plus things like valgrind, gprof and gcov, I don't understand why more people don't use them. As for compiler flags, if we can build a whole distro around optimization (Gentoo), why can't we build a whole distro around debugging (-fstack-protector-all, -D_FORTIFY_SOURCE=2, -g3, etc)? I realize some distros already enable things like this, but usually they are looking to harden things, not necessarily diagnose bugs.
"The finding is based on an analysis by the Coverity Scan Service, which for more than seven years analyzed 850 million lines of code from more than 300 open-source projects, including those written in Linux, PHP and Apache."
"In general, Coverity found the average quality of open-source software was virtually equal to that of proprietary software. Open-source projects showed an average defect density of .69, the study found, a dead heat with the .68 for proprietary code developed by enterprise customers of the service.
Although the average rates of defects in the two types of code are nearly identical, researchers did find a difference in quality trends based on the size of the development project.
For instance, as proprietary software coding projects passed 1 million lines of code, defect density dropped from .98 to .66, a sign that software quality rises in proprietary projects of that size.
That trend reversed itself in the cost of open-source code, researchers found. Open source projects between 500,000 and 1 million lines of code had a defect density of .44, which grew to .75 when those projects went over the 1 million line mark."
Could it be that over-1m-LOC proprietary projects are, in fact, fossilised? Once a project is large enough, deep changes are discouraged because their cost (and risk) to the business gets too high.
Meanwhile, open source projects like to refactor (somebody would say reinvent the wheel) forever and ever, constantly ripping out old code for new, so defect density is stable and simply rises in line with overall complexity (which obviously rises with project size).
I'd be curious to look also at developers' turnaround rates: once you leave a company you can't keep hacking on their code, which is something you can actually do with open-source. As old developers leave, their code lies untouched for fear of breaking anything, and again gets fossilised.
You could probably also speculate about the impact of the corporate projects. For example, if the project is over 1M LOC, can we surmise it is very likely that project is their bread-and-butter (and thus gets much more attention and resources)?
I'd make some snarky comment about the quality of one small but important part of Linux noted for ongoing consternation - sound - but seems that'd be poking a hornet's nest again. https://news.ycombinator.com/item?id=5664202
Intended relevance was raising the issue of what constitutes defects, the range of effects, and longevity thereof. As others noted, some of what were counted as "defects" were little more than semantic inconsistencies or obscure flaws rarely seen (if ever); counting each of those as "1 defect" on the same scale as something that pesters the heck out of a large percentage of users (or drives away many prospective users) isn't quite right.
My poor wording was an attempt to raise the point without eliciting a similar hundred+ responses as the last time it came up.
That makes this comparison complete nonsense, surely? "Bugs found by static analyser X" is only useful as a metric for comparing software projects insofar as it's representative of wider code quality. Which may well be true normally, but doesn't work if you report those bugs, then do the analysis again after they're fixed to compare with the results from software projects you didn't do that with!
[1] See http://www.coverity.com/library/pdf/linux_report.pdf . At one point it listed all linux bugs found at http://linuxbugs.coverity.com/ . Example bug report on lkml from last month: https://lkml.org/lkml/2013/4/5/297