Hacker News new | past | comments | ask | show | jobs | submit login
Linux code is the 'benchmark of quality,' study concludes (pcworld.in)
132 points by jeffreyfox on May 15, 2013 | hide | past | favorite | 46 comments



Coverity has been reporting bugs it finds in the Linux kernel since ~2000.[1]

That makes this comparison complete nonsense, surely? "Bugs found by static analyser X" is only useful as a metric for comparing software projects insofar as it's representative of wider code quality. Which may well be true normally, but doesn't work if you report those bugs, then do the analysis again after they're fixed to compare with the results from software projects you didn't do that with!

[1] See http://www.coverity.com/library/pdf/linux_report.pdf‎ . At one point it listed all linux bugs found at http://linuxbugs.coverity.com/ . Example bug report on lkml from last month: https://lkml.org/lkml/2013/4/5/297


Props for saying this.


Then there's the seL4 OS kernel, which was developed and proven to have zero bugs. http://www.theengineer.co.uk/news/safer-software/312631.arti...


This is fairly meaningless, what is classed as a "defect" for example? Not all bugs are equal.

What about comparing code that has similar requirements and has similar numbers of users.

Of course Linux is going to fare well against BS corporate software that was made primarily to satisfy some middle manager.

Likewise open source is going to include a lot of stuff written by college students that nobody actually uses.

Would be more interesting to compare Linux with similar parts of the NT core for example.


Even worse, not all lines of code are equal. Across languages, but also within them. C/C++ can be terse or verbose depending on style and the subset of features employed.


I'm a big fan of Coverity but it takes a lot of babying to make it useful. The code bases where I've brought Coverity bugs down to an acceptable ratio included a lot of markups and comments like

"// doing it like this to make Coverity happy"

The other issue is that the high-quality closed source codebases are probably inaccessible precisely because the amount of investment it takes to get the defect count low is also the reason they are closed.


Ok, so my problem with this is what qualifies as a "bug" in their scans. If there scans are so good at finding these bugs then we need to pay them all tons of money and make our software bug free by using their scanning tool.


My experience with their free open source scan is that of the 50 or so "defects" found, one was an outright bug, one was a false-positive, one was intentional (a deliberate crash in some debug code), and the rest were basically style complaints.


The OpenBSD devs have been very critical of it. That may just be sour grapes, but they may have some valid criticisms as well.


I've often wondered what the defect rate is for various projects as measured by hours spent. I've often seen commercial code written quite quickly, while open source code, being written more for self-actualization, artistry, and social proof, tends to be written more slowly (and carefully). On the other hand, good engineers tend to write code more quickly than bad ones.


I'd rather suggest PostgreSQL's sources: * it's user space code * they are much more uniform * they are much more readable


Or how about sqlite? It's the epitome of ruthless testing.


Linux! Ha! I am sure the device driver for that 256mb Hello Kitty USB thumb-drive is poetry. Linux is not the first place I'd look for high-quality code. I think I'd start with Minix.


I'd start with code that people actually use..


The standards for admission into the mainline Linux kernel tree are pretty high.


True. And I wonder if Minix is higher. But from a security point of view, you're only as strong as your weakest link. Linux runs device drivers in kernel space, so your OS is as strong as the hello-kitty driver. Think this is hard to exploit? You can create a USB device to search a linux host for device drivers that are exploitable, then imitate that device. [1]

Suddenly, the Hello Kitty USB drive matters. That code is running in kernel space.

Minix on the other hand runs device drivers in user-land. [2]

Given that device drivers contain 3-7 times as many bugs as other kernel code,[3] a conclusion you may reach is that Linux contains more bugs per line than Minix.

[1] https://www.youtube.com/watch?v=D8Im0_KUEf8

[2] http://www.minix3.org/other/reliability.html

[3] http://www.osnews.com/story/15960

ps. Sure as hell I can't code to the standard of getting a non-trivial patch accepted to the Kernel :-)


Interesting, though I guess there is a non-trivial performance cost to userspace drivers, seems hard to believe you could reasonably drive a GPU from userland. I remember John Carmack saying something about how driver overhead was the one of the biggest bottlenecks when developing modern games.

Driver quality is of course something which will always significantly rock the boat when it comes to stability but that is going to be the same with any operating system. To an extent driver quality should be a factor when choosing hardware. If you don't build your kernel with Hello Kitty support you never have to worry about that code.

I guess that is one of the reasons that Apple has a better reputation for software reliability in that for the most part they get to choose the hardware that will be used with the OS.


Most GPU drivers on Linux are largely userspace based. The kernel bit has a verifier for the command buffer generated by userland to ensure it's not making the GPU read out of memory that that process shouldn't.

On embedded there's less verification going on, but the drivers are still almost all in userland.


I think Apple USB drivers are userspace. USB was slow until 3.0.


I guess that would make sense, since USB peripherals are likely less performance sensitive in terms of latency and are also the place where you are going to get the widest variety of devices.


If you have physical access to a computer (so you can put Hello Kitty USB in it) you can have root on it anyway, no matter the OS.


This would be a fun game: CTF with physical access :-)


Yeah, it always bugs me, people bitch and moan about "getting into mainline" and having to "satisfy every use case of Linux", then they claim Linux is shitty code. They don't seem to connect that discriminating in what patches are accepted is no guarantee of quality, but it is a pre-requisite.


If by benchmark of quality you mean code with the lowest defect density ("Defect density refers to the number of defects per 1000 lines of software code.").


and if by 'defect' you mean 'thing flagged by coverity analysis tools'.


It's a quantifiable index.

It's by no means the definition of "code quality", but others parameters (readability, efficiency, encapsulation, etc.) are in good part subjective and very hard to translate into numbers, I think.


If that is the benchmark, then Minix beats Linux in virtue of not containing device drivers in kernel space. [1]

DD's have 3-7 times as many bugs in Linux than the other code. [2]

Linux can do better and I am sure Linus would admit this.

[1] http://www.minix3.org/other/reliability.html

[2] http://www.osnews.com/story/15960


This guy in ms research studies code defects in the context of ms software. http://research.microsoft.com/apps/mobile/showpage.aspx?page...


Benchmark of open source quality perhaps. Over everything I'd say NASA would probably take that prize. Of course we don't have their sources to analyze but their practices are well known and the results seem pretty strong.


Nonsense. I work long hours debugging various kernels, including Windows, Linux and the *BSDs. The quality of the OpenBSD kernel is amazing. Maybe Minix is better, but that's because it's educational code.


How does this scan service work? Is it just BS?

If software can automatically find code defects... why are there any code defects at all anymore? Just fix whatever their scan says to fix.


It's a relatively sophisticated static analyser. Nothing new, but quite useful.

Open source projects can register and get reports for free, commercial companies have to pay. Coverity uses eg. Linux to test and compare their product against, and write various marketing pieces such as this one to raise awareness for their product.

http://scan.coverity.com/


Defect is too broad a term in this study. The headline is sensationalization of one metric, which is it itself too broad. And does it really measure what it states?


Presumably, this is only kernel space code... That's only a fraction of a modern Linux OS.

Good study though... not the greatest qualification of facts in the article.


I've often felt that more developers (or even interested power users), should be running with things like MALLOC_CHECK_=3 (http://www.novell.com/support/kb/doc.php?id=3113982) enabled by default for everything. On top of that, when we have plenty of FLOSS static analysis tools (https://news.ycombinator.com/item?id=4545188), plus things like valgrind, gprof and gcov, I don't understand why more people don't use them. As for compiler flags, if we can build a whole distro around optimization (Gentoo), why can't we build a whole distro around debugging (-fstack-protector-all, -D_FORTIFY_SOURCE=2, -g3, etc)? I realize some distros already enable things like this, but usually they are looking to harden things, not necessarily diagnose bugs.


You may want to look into Hardened Gentoo which does things along the lines you suggest, amongst other hardening techniques.

http://www.gentoo.org/proj/en/hardened/


It would be very interesting to see how the quality of BSD code is compared to Linux.


I would love to know how Linux fares against any of the Windows OSs in this scenario.


We would only hear about it if Linux lost.


Compared to what? Proprietary corporate CRUD code? How about comparing to BSD, Hurd, Haiku, Mach etc.?

Edit: This article has better details. http://gcn.com/blogs/pulse/2013/05/linux-leads-in-open-sourc...

"The finding is based on an analysis by the Coverity Scan Service, which for more than seven years analyzed 850 million lines of code from more than 300 open-source projects, including those written in Linux, PHP and Apache."

"In general, Coverity found the average quality of open-source software was virtually equal to that of proprietary software. Open-source projects showed an average defect density of .69, the study found, a dead heat with the .68 for proprietary code developed by enterprise customers of the service.

Although the average rates of defects in the two types of code are nearly identical, researchers did find a difference in quality trends based on the size of the development project.

For instance, as proprietary software coding projects passed 1 million lines of code, defect density dropped from .98 to .66, a sign that software quality rises in proprietary projects of that size.

That trend reversed itself in the cost of open-source code, researchers found. Open source projects between 500,000 and 1 million lines of code had a defect density of .44, which grew to .75 when those projects went over the 1 million line mark."


Could it be that over-1m-LOC proprietary projects are, in fact, fossilised? Once a project is large enough, deep changes are discouraged because their cost (and risk) to the business gets too high.

Meanwhile, open source projects like to refactor (somebody would say reinvent the wheel) forever and ever, constantly ripping out old code for new, so defect density is stable and simply rises in line with overall complexity (which obviously rises with project size).

I'd be curious to look also at developers' turnaround rates: once you leave a company you can't keep hacking on their code, which is something you can actually do with open-source. As old developers leave, their code lies untouched for fear of breaking anything, and again gets fossilised.


You could probably also speculate about the impact of the corporate projects. For example, if the project is over 1M LOC, can we surmise it is very likely that project is their bread-and-butter (and thus gets much more attention and resources)?


Well Haiku makes use of Coverity so they should be part of the comparison, I know FreeBSD used it in the past but I'm not sure nowadays.


I'd make some snarky comment about the quality of one small but important part of Linux noted for ongoing consternation - sound - but seems that'd be poking a hornet's nest again. https://news.ycombinator.com/item?id=5664202


Is this really relevant here?

(I'm sure the sound code has few statically-detectable defects... even if it fails to produce anything audible for most people.)


Intended relevance was raising the issue of what constitutes defects, the range of effects, and longevity thereof. As others noted, some of what were counted as "defects" were little more than semantic inconsistencies or obscure flaws rarely seen (if ever); counting each of those as "1 defect" on the same scale as something that pesters the heck out of a large percentage of users (or drives away many prospective users) isn't quite right.

My poor wording was an attempt to raise the point without eliciting a similar hundred+ responses as the last time it came up.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: