Linux code is the 'benchmark of quality,' study concludes

SEMW · on May 15, 2013

Coverity has been reporting bugs it finds in the Linux kernel since ~2000.[1]

That makes this comparison complete nonsense, surely? "Bugs found by static analyser X" is only useful as a metric for comparing software projects insofar as it's representative of wider code quality. Which may well be true normally, but doesn't work if you report those bugs, then do the analysis again after they're fixed to compare with the results from software projects you didn't do that with!

[1] See http://www.coverity.com/library/pdf/linux_report.pdf‎ . At one point it listed all linux bugs found at http://linuxbugs.coverity.com/ . Example bug report on lkml from last month: https://lkml.org/lkml/2013/4/5/297

abc_lisper · on May 15, 2013

Props for saying this.

ctdonath · on May 15, 2013

Then there's the seL4 OS kernel, which was developed and proven to have zero bugs. http://www.theengineer.co.uk/news/safer-software/312631.arti...

jiggy2011 · on May 15, 2013

This is fairly meaningless, what is classed as a "defect" for example? Not all bugs are equal.

What about comparing code that has similar requirements and has similar numbers of users.

Of course Linux is going to fare well against BS corporate software that was made primarily to satisfy some middle manager.

Likewise open source is going to include a lot of stuff written by college students that nobody actually uses.

Would be more interesting to compare Linux with similar parts of the NT core for example.

gd1 · on May 15, 2013

Even worse, not all lines of code are equal. Across languages, but also within them. C/C++ can be terse or verbose depending on style and the subset of features employed.

zwieback · on May 15, 2013

I'm a big fan of Coverity but it takes a lot of babying to make it useful. The code bases where I've brought Coverity bugs down to an acceptable ratio included a lot of markups and comments like

"// doing it like this to make Coverity happy"

The other issue is that the high-quality closed source codebases are probably inaccessible precisely because the amount of investment it takes to get the defect count low is also the reason they are closed.

S_A_P · on May 15, 2013

Ok, so my problem with this is what qualifies as a "bug" in their scans. If there scans are so good at finding these bugs then we need to pay them all tons of money and make our software bug free by using their scanning tool.

plorkyeran · on May 15, 2013

My experience with their free open source scan is that of the 50 or so "defects" found, one was an outright bug, one was a false-positive, one was intentional (a deliberate crash in some debug code), and the rest were basically style complaints.

16s · on May 15, 2013

The OpenBSD devs have been very critical of it. That may just be sour grapes, but they may have some valid criticisms as well.

codex · on May 15, 2013

I've often wondered what the defect rate is for various projects as measured by hours spent. I've often seen commercial code written quite quickly, while open source code, being written more for self-actualization, artistry, and social proof, tends to be written more slowly (and carefully). On the other hand, good engineers tend to write code more quickly than bad ones.

_pmf_ · on May 15, 2013

I'd rather suggest PostgreSQL's sources: * it's user space code * they are much more uniform * they are much more readable

npsimons · on May 15, 2013

Or how about sqlite? It's the epitome of ruthless testing.

derrida · on May 15, 2013

Linux! Ha! I am sure the device driver for that 256mb Hello Kitty USB thumb-drive is poetry. Linux is not the first place I'd look for high-quality code. I think I'd start with Minix.

jiggy2011 · on May 15, 2013

I'd start with code that people actually use..

sp332 · on May 15, 2013

The standards for admission into the mainline Linux kernel tree are pretty high.

derrida · on May 15, 2013

True. And I wonder if Minix is higher. But from a security point of view, you're only as strong as your weakest link. Linux runs device drivers in kernel space, so your OS is as strong as the hello-kitty driver. Think this is hard to exploit? You can create a USB device to search a linux host for device drivers that are exploitable, then imitate that device. [1]

Suddenly, the Hello Kitty USB drive matters. That code is running in kernel space.

Minix on the other hand runs device drivers in user-land. [2]

Given that device drivers contain 3-7 times as many bugs as other kernel code,[3] a conclusion you may reach is that Linux contains more bugs per line than Minix.

[1] https://www.youtube.com/watch?v=D8Im0_KUEf8

[2] http://www.minix3.org/other/reliability.html

[3] http://www.osnews.com/story/15960

ps. Sure as hell I can't code to the standard of getting a non-trivial patch accepted to the Kernel :-)

jiggy2011 · on May 15, 2013

Interesting, though I guess there is a non-trivial performance cost to userspace drivers, seems hard to believe you could reasonably drive a GPU from userland. I remember John Carmack saying something about how driver overhead was the one of the biggest bottlenecks when developing modern games.

Driver quality is of course something which will always significantly rock the boat when it comes to stability but that is going to be the same with any operating system. To an extent driver quality should be a factor when choosing hardware. If you don't build your kernel with Hello Kitty support you never have to worry about that code.

I guess that is one of the reasons that Apple has a better reputation for software reliability in that for the most part they get to choose the hardware that will be used with the OS.

randallu · on May 15, 2013

Most GPU drivers on Linux are largely userspace based. The kernel bit has a verifier for the command buffer generated by userland to ensure it's not making the GPU read out of memory that that process shouldn't.

On embedded there's less verification going on, but the drivers are still almost all in userland.

justincormack · on May 15, 2013

I think Apple USB drivers are userspace. USB was slow until 3.0.

jiggy2011 · on May 15, 2013

I guess that would make sense, since USB peripherals are likely less performance sensitive in terms of latency and are also the place where you are going to get the widest variety of devices.

ajuc · on May 15, 2013

If you have physical access to a computer (so you can put Hello Kitty USB in it) you can have root on it anyway, no matter the OS.

derrida · on May 15, 2013

This would be a fun game: CTF with physical access :-)

npsimons · on May 15, 2013

Yeah, it always bugs me, people bitch and moan about "getting into mainline" and having to "satisfy every use case of Linux", then they claim Linux is shitty code. They don't seem to connect that discriminating in what patches are accepted is no guarantee of quality, but it is a pre-requisite.

nolok · on May 15, 2013

If by benchmark of quality you mean code with the lowest defect density ("Defect density refers to the number of defects per 1000 lines of software code.").

shabble · on May 15, 2013

and if by 'defect' you mean 'thing flagged by coverity analysis tools'.

Kurtz79 · on May 15, 2013

It's a quantifiable index.

It's by no means the definition of "code quality", but others parameters (readability, efficiency, encapsulation, etc.) are in good part subjective and very hard to translate into numbers, I think.

derrida · on May 15, 2013

If that is the benchmark, then Minix beats Linux in virtue of not containing device drivers in kernel space. [1]

DD's have 3-7 times as many bugs in Linux than the other code. [2]

Linux can do better and I am sure Linus would admit this.

[1] http://www.minix3.org/other/reliability.html

[2] http://www.osnews.com/story/15960

sjs1234 · on May 15, 2013

This guy in ms research studies code defects in the context of ms software. http://research.microsoft.com/apps/mobile/showpage.aspx?page...

rtkwe · on May 15, 2013

Benchmark of open source quality perhaps. Over everything I'd say NASA would probably take that prize. Of course we don't have their sources to analyze but their practices are well known and the results seem pretty strong.

aortega · on May 15, 2013

Nonsense. I work long hours debugging various kernels, including Windows, Linux and the *BSDs. The quality of the OpenBSD kernel is amazing. Maybe Minix is better, but that's because it's educational code.

jrochkind1 · on May 15, 2013

How does this scan service work? Is it just BS?

If software can automatically find code defects... why are there any code defects at all anymore? Just fix whatever their scan says to fix.

tobiasu · on May 15, 2013

It's a relatively sophisticated static analyser. Nothing new, but quite useful.

Open source projects can register and get reports for free, commercial companies have to pay. Coverity uses eg. Linux to test and compare their product against, and write various marketing pieces such as this one to raise awareness for their product.

http://scan.coverity.com/

FollowSteph3 · on May 15, 2013

Defect is too broad a term in this study. The headline is sensationalization of one metric, which is it itself too broad. And does it really measure what it states?

alexchamberlain · on May 15, 2013

Presumably, this is only kernel space code... That's only a fraction of a modern Linux OS.

Good study though... not the greatest qualification of facts in the article.

npsimons · on May 15, 2013

I've often felt that more developers (or even interested power users), should be running with things like MALLOC_CHECK_=3 (http://www.novell.com/support/kb/doc.php?id=3113982) enabled by default for everything. On top of that, when we have plenty of FLOSS static analysis tools (https://news.ycombinator.com/item?id=4545188), plus things like valgrind, gprof and gcov, I don't understand why more people don't use them. As for compiler flags, if we can build a whole distro around optimization (Gentoo), why can't we build a whole distro around debugging (-fstack-protector-all, -D_FORTIFY_SOURCE=2, -g3, etc)? I realize some distros already enable things like this, but usually they are looking to harden things, not necessarily diagnose bugs.

amboar · on May 15, 2013

You may want to look into Hardened Gentoo which does things along the lines you suggest, amongst other hardening techniques.

http://www.gentoo.org/proj/en/hardened/

kunai · on May 15, 2013

It would be very interesting to see how the quality of BSD code is compared to Linux.

asloobq · on May 15, 2013

I would love to know how Linux fares against any of the Windows OSs in this scenario.

alexchamberlain · on May 15, 2013

We would only hear about it if Linux lost.

cooldeal · on May 15, 2013

Compared to what? Proprietary corporate CRUD code? How about comparing to BSD, Hurd, Haiku, Mach etc.?

Edit: This article has better details. http://gcn.com/blogs/pulse/2013/05/linux-leads-in-open-sourc...

"The finding is based on an analysis by the Coverity Scan Service, which for more than seven years analyzed 850 million lines of code from more than 300 open-source projects, including those written in Linux, PHP and Apache."

"In general, Coverity found the average quality of open-source software was virtually equal to that of proprietary software. Open-source projects showed an average defect density of .69, the study found, a dead heat with the .68 for proprietary code developed by enterprise customers of the service.

Although the average rates of defects in the two types of code are nearly identical, researchers did find a difference in quality trends based on the size of the development project.

For instance, as proprietary software coding projects passed 1 million lines of code, defect density dropped from .98 to .66, a sign that software quality rises in proprietary projects of that size.

That trend reversed itself in the cost of open-source code, researchers found. Open source projects between 500,000 and 1 million lines of code had a defect density of .44, which grew to .75 when those projects went over the 1 million line mark."

toyg · on May 15, 2013

Could it be that over-1m-LOC proprietary projects are, in fact, fossilised? Once a project is large enough, deep changes are discouraged because their cost (and risk) to the business gets too high.

Meanwhile, open source projects like to refactor (somebody would say reinvent the wheel) forever and ever, constantly ripping out old code for new, so defect density is stable and simply rises in line with overall complexity (which obviously rises with project size).

I'd be curious to look also at developers' turnaround rates: once you leave a company you can't keep hacking on their code, which is something you can actually do with open-source. As old developers leave, their code lies untouched for fear of breaking anything, and again gets fossilised.

sliverstorm · on May 15, 2013

You could probably also speculate about the impact of the corporate projects. For example, if the project is over 1M LOC, can we surmise it is very likely that project is their bread-and-butter (and thus gets much more attention and resources)?

gillianseed · on May 15, 2013

Well Haiku makes use of Coverity so they should be part of the comparison, I know FreeBSD used it in the past but I'm not sure nowadays.

ctdonath · on May 15, 2013

I'd make some snarky comment about the quality of one small but important part of Linux noted for ongoing consternation - sound - but seems that'd be poking a hornet's nest again. https://news.ycombinator.com/item?id=5664202

to3m · on May 15, 2013

Is this really relevant here?

(I'm sure the sound code has few statically-detectable defects... even if it fails to produce anything audible for most people.)

ctdonath · on May 15, 2013

Intended relevance was raising the issue of what constitutes defects, the range of effects, and longevity thereof. As others noted, some of what were counted as "defects" were little more than semantic inconsistencies or obscure flaws rarely seen (if ever); counting each of those as "1 defect" on the same scale as something that pesters the heck out of a large percentage of users (or drives away many prospective users) isn't quite right.

My poor wording was an attempt to raise the point without eliciting a similar hundred+ responses as the last time it came up.