Although this is interesting, we should perhaps not hold a high degree of confidence in the results, because the methodology relies on the content of commit messages and the number of commits (if I read this correctly).
- The content of commit messages varies widely in expressiveness and meaningfulness, from 'Fix bug' to a detailed explanation. This confounds the classification of a commit.
- The number of commits can be very misleading depending on the committer's workflow. Some committers merge topic branches to include all their intermediate work in progress commits, which could overrepresent commits flagged as errors. Other committers rebase their topic branches into fewer, or even single, commits before merging. Or, some commits may fix multiple defects.
This kind of analysis is conceptually a worthy endeavor; it would be more meaningful if the metrics it employed were more strongly correlated with the attributes it was trying to analyze.
Also the way one works in the various languages is different. Some people are more likely to push barely working code to github because of the language's culture.
Perhaps timing should be considered as well - how long it takes to implement a feature including fixing its associated bugs.
Figure 2 suggests another possible bias favoring functional, managed languages: a lot of errors for C/C++ are related to concurrency and performance. But those are mostly non-bugs for other languages, since when concurrency or performance are a requirement then most of those studied languages would not be considered anyway.
It seems similar to the paradox that makes the best medicine appear to have a lower survival rate just because it's given to most serious patients.
Rephrasing to make my point clearer: you open a performance bug against, say, a C program more often than against, say, a python program, because performance is more likely to be a requirement of a C program than of a python program.
Similarly, again for performance reason, your average C program will have more concurrency than your average python program, therefore also more bugs.
Another way to put it: you use C when you have a complex problem to solve and python when you have a simple problem to solve (unless you are a masochist or a purist I suppose). So, one reason those languages have more bugs may just be that the programs themselves are more prone to errors (which might only be slightly related to size, if at all)
>when concurrency or performance are a requirement
I see what you're getting at, but this is an irksome way to put it. We're clearly unwilling to wait for the heat death of the universe for our programs to terminate. Performance is always a requirement.
"When the performance requirements can only be met by C/C++" might be a more accurate formulation, but then it's just tautological.
Java, Go, Obj-C, Erlang, and Scala are all certainly in the running when concurrency is required, and fit within many latency budgets just fine. The managed and dynamic languages on the list are typically used in contexts where latency is dominated by network and disk I/O, so marginal CPU efficiency isn't worth much. That doesn't mean performance isn't a requirement, it means the most effective ways to increase performance are different. Adding indexes, optimizing queries, caching, etc.
I think its more like a level of performance is a requirement. Once you need a higher level of performance C/C++ becomes one of only a few tools you can use. If you need higher then you either go to Fortran or ASM.
Where do you see this in the paper? They say concurrency errors are mostly the usual things like deadlocks and race conditions, but those absolutely do exist in every language.
Also, what do you mean most of these languages wouldn't be considered when concurrency is required? Concurrency is bog standard everywhere.
It seems like the way the define a bug, a performance bug would be a bug relative to expectations, per project, so you can definitely have a performance bug in Go or Haskell, for example, if something works slower than developers think it should (as opposed to being slower than some external reference code or something). So maybe it's closer to something like "developer control over unexpected underperformance"?
Not even every language in that study supports concurrency, as the study itself points out. I hear a lot of praise for Go because of how much people like doing concurrency with it. The fact that they observed a higher rate of concurrency bugs in Go could just as easily support the interpretation that Go is good for concurrency as it does the interpretation is bad for concurrency.
Since Go makes concurrency easier and encourages its use, there’s going to be a lot more concurrency bugs. By contrast languages like Python don’t even have proper parallel threads, so fewer people will write concurrent python programs and fewer bugs will arise. This is a confounding factor found in one sort or another throughout the survey.
It’s good that they did this research but unfortunately they couldn’t account for everything.
They talk about this, and did do some sort of things to account for it a little. That's why the conclude that more so then overall defect, languages are more correlated to categories of defects.
Some languages (like Clojure) very significantly reduce the possibility of thread-related bugs. Clojure in particular was designed with multi-threading in mind, so I think it is a fair point that some languages will have more trouble with this than others.
Try multithreading in C++ vs Clojure and the difference in amount of effort is well beyond trivial.
I think so too! But the operative question is not how well suited they are to the task. It's how often they're used for the task in the corpus. And on that point I suspect that the person you're replying to has the right of it. I suspect that reaching for concurrency is correlated with a desire for high performance, which in turn I suspect causes people also to reach for these lower level languages.
Uh, where is it shown how "software quality" or "code quality" is measured or determined? Can anyone provide a succinct definition of quality which the paper uses?
As best I can tell, they use commit messages to identify bugfixes, and later they jump to "defective commits". Presumably the bugfix commit is not the defective commit. There is no explanation I can find that shows how they arrive at a defective commit from a bugfix commit.
This specific methodology seems rife with weakness, all of which should be explained clearly and admitted up front.
curl is an extremely successful tool/library, I would consider it high quality without knowing the ratio of Normal patches to bug fixes. Skyrim is well known to crash and corrupt save games but regarded as one of the greatest RPGs ever made. the game programmers obviously did a lot of things right to produce such a hit. I'm not saying you need to be a worldwide success to write quality code, just that low bug count doesn't always mean high quality and the other way around. quality is measured by user experience
In this case, it's not really how useful/appealling the software is, but if there's some kind of correlation between languages and defects. So if skyrim was rewritten is haskell, then perhaps they'd have less defects.
But looking at the data and their analysis, while there's some interesting stats saying typed and functional languages have a correlation with less defects, there's just too many variables at play.
Skyrim actually runs in a VM. So it's already futuristic. And still crashes like a piece of crap written.
I think the 2nd or 3rd STALKER (or all) run in VM as well and... crash and corrupt saves like crazy.
And for those who don't get how it's related. A VM is "supposed" to be pretty damn crash-proof and "safer" much like functional languages. But that doesn't stop deadlines and bad coding practices from creating broken products.
Every Java program runs on a VM. It does not magically prevent sloppy Java code from crashing. It does prevent a lot of memory errors typical for sloppy C code, though.
I’m mildly surprised by how well Clojure performs here. It isn’t statically typed yet fares much better than Haskell/Scala! From my experience Clojure is also a joy to write, sometimes even more than Haskell.
I have read elsewhere (in many places, in fact, such as the interesting read "Out of the Tar Pit") that the frequency of bugs in a project scales proportionally to the code size. And that this has a greater influence on bugs than features of any particular language.
Clojure is an incredibly succinct language. It uses about half as many lines as Elm, 5% as many lines as C++. I love other languages, but nothing rivals Clojure in elegance. I believe this is the key reason why Clojure projects are so low on bugs -- they are much simpler to maintain, refactor, or rewrite entirely than in most other languages, so fixing problems is not the chore it can be elsewhere.
Oh come on. First, 20 vs 4 lines, which is 20%. Second, not exactly the most compact C++ version. You can easily make one in 10 lines even in a clean C style. Third, these code-golf comparisons are beyond silly.
It really does just boil down (in my opinion) to code size. A small program is going to have fewer bugs than a large program. And Clojure programs can be quite small compared to the same programs written in other languages.
While I too am attracted to the static typing in many ways, I have found in my own experience of working in C++ full time for a few years (with some non-trivial work in Swift as well, which has quite a strict type system), and then Clojure full time for a few years, that when you measure all the other features of a language, typing alone does not make a big difference.
It's kind of like the argument some people make about how the cost of your rent or mortgage is the best indicator of your cost of living. There are so many other factors, and some of the lowest cost of living I've personally experienced has been in places with considerably higher rent than average, due to the offset of other factors.
Why is that hard to believe? Static typing always add some code size, even in languages with inference.
On top of that, the pervasive use of macros in lisp tend to lead to a kind of code-reuse that languages without macros lack.
Not to say that Haskell can't be concise also, just not to the same degree. And I've seen very long lines in Scala that were due to pleasing the compiler's typing.
Elm and Clojurescript offer a good contrast. Elm is written in Haskell and is not too unlike Haskell syntax and general language use. I've written in both Elm and Clojurescript and there is nearly a 2X difference in code size of Elm over Clojurescript for accomplishing similar non-trivial tasks.
Code size isn't everything, but it is important.
If you still find it that hard to believe, just try Clojure for awhile, and I think it will be clear. It's a tradeoff. When you don't have to satisfy a type checker for everything, you do write less code, that's just a matter of fact most of the time.
The question is whether the increased code size of a statically-type language over a lisp makes up for the extra developer resources and potential for bloat-related bugs. But that is of course the core of the typing debate.
Elm is nowhere near Haskell or Scala in expressibility. You would have to compare Purescript/GHCJS/ScalaJS. In the native Haskell/Scala land I usually don't have to do much ceremony and the functional code is succinct and a pleasure to write. They may lack the all powerful macros of LISPs but the need to reach for a macro seldom arises.
The original point wasn't about code size it was about bugs and I was surprised to see Clojure doing better than the very strongly typed ones. Hence the comment on the smartness of Clojure people ;)
I will agree that there are a lot of really smart people in the Clojure community, but that is also true in other communities. Some of the most brilliant developers I know are hard-core C++ guys.
The original point about bugs is in my opinion, regarding my comment you replied to, very similar to a discussion about code size. There are some great reads [0] that go into further depth on this issue, if you are curious.
I think it's a good starting point to look at a large number of open source projects in the wild. The individual differences in skill, size, etc. average out between them. It's important to establish whether any statistically significant trends exist before anything further can be discussed meaningfully.
If we see empirical evidence that projects written in certain types of languages consistently perform better in a particular area, such as reduction in defects, we can then make a hypothesis as to why that is.
For example, if there was statistical evidence to indicate that using Haskell reduces defects, a hypothesis could be made that the the Haskell type system plays a role here. That hypothesis could then be further tested, and that would tell us whether it's correct or not.
However, this is pretty much the opposite of what happens in discussions about features such as static typing. People state that static typing has benefits and then try to fit the evidence to fit that claim. Even the authors of this study fall into this trap. They read into the preconceived notions that are not supported by the data in their results. The differences they found are so small that it's reasonable to say that the impact of the language is negligible.
Perhaps you should actually read the conclusion before getting too excited:
>One should take care not to overestimate the impact of language on defects. While the observed relationships are statistically significant, the effects are quite small. Analysis of deviance reveals that language accounts for less than 1% of the total explained deviance.
Nor do these most powerful statically typed languages appear to perform any better than dynamically typed Clojure and Erlang.
> For example, in languages like Perl, JavaScript, and CoffeeScript adding a string to a number is permissible (e.g., "5" + 2 yields "52"). The same operation yields 7 in Php. Such an operation is not permitted in languages such as Java and Python as they do not allow implicit conversion.
Regarding Perl, the quoted statement is wrong:
$ perl -E 'say "5" + 2'
7
Furthermore, this is not an implicit conversion. The + operator is an explicit numeric conversion. Here's a more detailed description:
Although no conversion is requested explicitly in the function definition, a conversion may take place depending on the types of the arguments passed in:
> add(1,2)
3
>add("1",2)
"12"
The article in question defines implicit conversion in this way, and in my experience it's a fairly common term.
I was pointing out that per this definition, the article is wrong in saying that perl's + operator may perform an implicit conversion. In perl the + operator always performs a numeric conversion of both its operands, regardless of types. By writing + you are explicitly requesting numeric conversion of both arguments.
In general perl doesn't perform implicit conversion (of course there are some exceptions -- it is perl after all). It does this by not overloading operators like + for different operations such as addition and concatenation.
This also has the nice property that you can count on a+b == b+a, unlike python for instance. (However, in python PEP-465, non-commutativity was a stated advantage of adopting @ for matrix multiplication instead of overloading *, go figure).
The + operator is not an explicit numeric conversion. What numeric type is it converting to? In Python you can also combine two lists using the + operator.
Haskell for example requires to be explicit about this:
No, not necessarily. It's standard practice to pick a p-value significance cut-off (0.05), but report the smallest such standard cut-off that any particular value meets. So "p < 0.001" is reported for values that meet that threshold. Anything over the cut-off is just not reported as significant.
That seems dishonest to me. They're saying that some results are more significant after the fact. Is there any mathematical justification for why this is OK?
Fascinating study, but I think a lot of the conclusions in this study are self-evident. For example:
"However when compared to the average, as a group, languages that do not allow implicit type conversion are less error-prone while those that do are more error-prone."
A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...
Still worth a read though, and makes a strong case for functional, statically-typed languages.
> Still worth a read though, and makes a strong case for functional, statically-typed languages.
The thing is, it really doesn't. There are too many inexplicable results. Typescript does significantly worse than Javascript, for example. There's also no real good explanation why the results for Ruby and Python are diametrically opposite, basically (the languages are more alike than different). And Clojure has the best result of them all.
I suspect that there are simply too many confounding variables that are not accounted for (such as the typical application domains for those languages, average programmer skill, or complexity of the problems being targeted by these projects).
Yes, I think after reading to the end I agree with your summary.
I still think there is value in using languages that eliminate entire classes of bugs though, for example using a language that has automatic memory management is a no-brainer except for certain specific domains where you need to do memory management yourself. Likewise with static typing: it eliminates type bugs. There have definitely been times for me recently when working with a dynamic language like JavaScript and there's been a bug in our code base that would not have happened had we been using TypeScript. Some of these bugs also had significant business impacts.
There is of course a trade off, typed languages can be more challenging to develop with: I've had a number of fights with the Scala compiler. Typically it's libraries rather than the base language, but it still costs time I wouldn't have spent if using a dynamic language. Also, the Scala compiler itself is very slow, to the point where the XKCD comic about "code's compiling" has been true. On modern Macbook Pros, this shouldn't be a thing anymore, but it still is :)
> I still think there is value in using languages that eliminate entire classes of bugs though, for example using a language that has automatic memory management is a no-brainer except for certain specific domains where you need to do memory management yourself. Likewise with static typing: it eliminates type bugs.
In practice, it's not so simple. It probably holds for garbage collection (assuming GC is suitable for your application domain), but static type systems come with costs. There's plenty of evidence (via studies) that type annotations are really valuable as documentation, but the argument for bug prevention is less clear. Most bugs that are caught by static type systems are also generally prevented by other approaches (because they're basically clerical errors that you hit fast even during basic testing). Conversely, there are plenty of real bugs that aren't caught by type systems (or only by type systems that are essentially full-fledged specification languages, or by putting lots of work into types).
While there's a huge difference between having a validation strategy for your code and not having a validation strategy at all (or winging it), it's much more difficult to assess the relative value of different validation strategies, especially once you take costs into account.
> A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...
Well, of course, indeed... There is a bogus argument here, and it is not in that part of the study which is being ridiculed. Modifying the statement you are arguing against is often an indication that something isn't quite right.
Of course, what matters here is the overall error rate, possibly weighted for severity (though trying to do that is itself problematic), on comparable tasks. In a rational world, anything that can eliminate an important class of error, without making corresponding increases elsewhere, would be regarded as a success.
> The data indicates that functional languages are better than procedural languages; it suggests that disallowing implicit type conversion is better than allowing it; that static typing is better than dynamic; and that managed memory usage is better than unmanaged. Further, that the defect proneness of languages in general is not associated with software domains. Additionally, languages are more related to individual bug categories than bugs overall.
> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.
This article is dated October 2017 and claims to be the first large-scale evidentiary study. But I have definitely seen either this exact study or another one nearly identical, also using GitHub and also having similar results for the languages, and that was at least 1 year ago. So perhaps this article is a re-print of a prior study?
That's how CACM research highlights work. A published paper is invited to be featured as a highlight. The paper is edited somewhat, generally it's made smaller with some details removed. The edited paper will then appear some time later along with a discussion of the paper written by someone not associated with the paper. If the paper is controversial, as this one is, a long time may pass between the original date of publication and the publication of the CACM research highlight.
Interesting that the social element isn't mentioned.
Smarter programmers are likely to be able to get their head around the strict requirements of functional languages and they are the ones using the languages at the moment.
Java, on the other hand, is pretty much the COBOL of this generation.
> Interesting that the social element isn't mentioned.
It's right there in the abstract: There might be "other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion."
http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf