Some years ago I wrote "Is Java 10x more verbose than Python (LOC)? A modest empiric approach" [1]. The most interesting comment quoted "PROGRAMMING LANGUAGES ARE LIKE GIRLFRIENDS: THE NEW ONE IS BETTER BECAUSE YOU ARE BETTER"from [2].
With regard to your first link, I really have to take issue with code like the following as a measure of Java and Python's relative verbosity:
class Artist:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
There may be an argument to, as you say, "Never, never, never use String in Java" (or other statically typed languages), but a class like this in a duck-typed language such as Python is nonsensical.
I'm not even arguing that Python's dynamically typed approach is necessarily better overall than Java, but if you're going to evaluate the two in terms of verbosity then you really have to compare idiomatic code in both languages. Whereas if you continue writing Java in Python, of course you'll only see a 2x improvement.
Sorry, my Python was in the 90s, I dropped it for Java back than so my skills - even in 2008 - were not the best, as I've said in the post "Sorry that my Python is rusty, all correcting comments or comments on how to do it better are welcome."
Not sure what your argument has to do with dynamically typed references? How would one create a constructor for an object that takes an argument (name)?
" but a class like this in a duck-typed language such as Python is nonsensical."
Does this mean people do not use classes in Python (e.g. for DDD)?
No, it means people don't usually create completely useless data-holder classes in Python. Instead they use standard collections, depending on their hierarchy of needs: tuples, dicts or namedtuples.
All three of the classes in the example are basically worthless. Including SongList, which is the one with the most logic (a single line) and which you didn't even use at the end. Here's the three classes, at their most future-proof:
from collections import namedtuple
Song = namedtuple('Song', 'name duration artist')
Artist = namedtuple('Artist', 'name')
class SongList(list): pass
and as a side-note, this:
for song in (song for song in songList if song.duration < 10):
print "%s by %s (%d)" % (song.name, song.artist, song.duration)
is needlessly complex, why the comprehension? Just write a loop and a conditional:
for song in songList:
if song.duration < 10:
print "{0.name} by {0.artist.name} ({0.duration})".format(song)
It probably depends on the work being done — matrix computations likely wouldn't be significantly shorter in Python, even with numpy — but considering the java styles I know of (rather defensive and comparatively low-level) and the Python styles I also know of, as long as your python isn't java I'd say it should remain significantly shorter. Not as short as replacing 20 LOC by a single namedtuple, that's a bit extreme, but 2x is what I expect of (mostly) translating java code into python, that's the rough overhead of the language itself when saying the same thing the same way.
"[..] as long as your python isn't java I'd say it should remain significantly shorter."
My intent with the post was to create empiric knowledge, I'm not a follower of the idea anecdotal 'evidence' trumps all that is pervasive in our industry. So thanks for your opinion, but I'm not interested in the least in anecdotes when comparing languages.
In truth, I wouldn't use anything other than a plain Python string to represent an artist's name, if that's the only thing about the artist you're interested in. But it's hard to have a meaningful conversation about such a contrived example as in that link.
To answer your question, though, yes: all other arguments about Python vs. Java aside, the LOC ratio would change dramatically in Python's favor once you start to add actual program logic. (Unless you're stuck with an architecture astronaut for a Python programmer.)
As they say on another part of the internet, "pics or it didn't happen". For an example this short it would be so easy to show the code, and see how much the scala version could actually be improved.
That is one thing I don't like about the shoot out, even when you don't want it to be about performance it still is. When they measure code size they only measure the best performing app. They don't have a separate set of measurements for concise code. So most of the FP entries are C in an FP skin.
The program get shorter because the author learned more and improved it each time, not mainly because of the languages.
Mainstream programming languages do not seem to vary much in lines of code of programs: the range is maybe about 2 or 3. Here is evidence: http://www.hxa.name/minilight/#comparison
This stands to reason. Look at everyday languages: they all have the same features -- they are perhaps surprisingly similar in basic structure. Control-flow, operations, data-primitives, data-compounds -- all are very similar. One of the more outlying is C: lacking common higher-level amenities like exceptions, nice data-structures, and particularly storage management, can expand code significantly.
These kinds of comparisons are worthless, because they are too short to show the advantages of large scale application of language specific idioms. LOC is not really a good measure of code size. The number of symbols is better. A single line of Java can be way more verbose than the same line in Python.
More than a decade ago I worked on a compiler written in C++ and did an exercise to see what the savings would be to implement it in Python. IIRC it was something like 80%. There were huge savings in the size of static data structures. A lot was due to the fact that Python lent itself to being an ad hoc DSL.
To be fair, the languages that I think lend themselves most to low LOC are haskell, common lisp, and clojure. None of these are covered in the minilight comparison.
Your list misses some pretty significant features that don't exist in Java, including lexical closures, list comprehensions, and macros.
Try spending some time on 4Clojure. For a while I was routinely solving problems in half a dozen lines or more, only to find that other people had done them in one line.
He probably just used standard language constructs (hasmaps, vectors, sets) instead of custom classes like in Java or Scala. In Clojure it's "idiomatic" to do that, and I believe it can make 10x difference in short programs.
Yeah, some of the improvement could be because of better understanding of the problem.
"I've written plenty of lisp (Clojure and non-Clojure) and I haven't observed it to be significantly shorter than Scala/Haskell."
Same here. Although I've found Haskell to be noticeably shorter than Scala. But there are trade-offs. Java reads sequentially, top-to-bottom, so when you are new to a piece of code, you trace through it in this linear fashion. And as Paul Graham says in On Lisp (p30), such code 'looks solid and blockish'. But it's approachable, if somewhat long and cluttered. On the other end of the scale is Haskell, which is anything but linear. And with a healthy dose of library-specific operators, it can be quite alien looking. Until you are accustomed to the syntactic patterns, it's pretty hard going. But after that point, it is considerably faster to grok the meaning of the program, as it is entirely clutter free. Clojure (and Lisps in general) is closer to Haskell in its non-linear nature, but without more syntactic help (eg infix operators), it never gets to the compactness of Haskell. And the indentation gets a bit heavier.
The increased LOC in Java is the price you pay for this linearity. But with program complexity and scale, this price can get to be pretty heavy. The same can be said for indiscriminate mutability and side-effects.
The increased LOC in Java is the price you pay for this linearity. But with program complexity and scale, this price can get to be pretty heavy.
It's not just that as the program gets larger, you have more of that linear code. It's that the program necessarily stops being linear. There's no linear way to lay out the logic of a sufficiently complex system. The pieces may be linear in the small, but the way they interact is not. The same is true of spaghetti.
For some reason when people talk about readable code they tend to consider just the pieces and not the interactions between them. But you can't understand a system without the latter and it's the latter where most complexity lurks (in any language; function calls are also a kind of goto).
With or without side effects, a function call says "go to this other place and do this other thing and bring back the result". That's a kind of goto – a structured kind that's always paired with a return. This is higher level than the old-fashioned GOTO but that doesn't make it complexity-free. It's still possible to create spaghetti with it.
Maybe that sounds pedantic, but I don't think it is, because interaction complexity needs to be minimized just like other kinds of complexity do. When you call a function you are implicitly drawing a line between the caller and the callee. Imagine a picture of your program with all those lines drawn at once. That picture ought to have some order to it.
Side effects are a different kind of complexity. If pure function calls are like going camping and leaving nothing behind, then function calls with side effects are like going camping and burning a fire or littering or what have you.
I don't know scala, but that CSV parser looks wrong. You will fail on any CSV file that contains commas in data items. It's the same reason why you can't use awk's field separators to parse CSV files.
My only problem with Clojure is the terrible JVM start up time.
While I agree that for typical long running applications like web servers/web applications this isn't really an issue, the problem manifests itself when you want to write small scripts or small command line applications that have similar performance to their Java equivalent.
If you write a CLI app in Java, the application will execute almost instantly as soon as you run it, but with languages like Clojure/Scala you have to wait a long time.
If you write a CLI app in Java, the application will execute almost instantly as soon as you run it, but with languages like Clojure/Scala you have to wait a long time.
I assume this means you are running the uncompiled version as the script? For Scala at least that is fairly uncommon in situations outside development.
(Note that to do the same in Java is very hard - you'd have to write a custom compile & execute script - and then it would perform similarly to the Scala/Clojure app anyway).
Also, I think it is great to see the startup time for a Java application praised. How things have changed!
ClojureScript is a potential way around the JVM; but there's the disadvantage of less libraries (since cljs tends to be used more in the browser), and you can't easily edit scripts in place since they have to be run through the (JVM) compiler first.
> My only problem with Clojure is the terrible JVM start up time.
Can't find it anymore, sadly, but I remember a guy had done some profiling (on leiningen maybe?) and it turns out more than 90% of the startup time was loading core.clj, not the JVM startup when only took a small fraction of the total.
(now part of clojure.core being so slow goes back to the JVM: a lot of resources are expended JITing stuff which — for leiningen — will just be thrown out. Hence the quite frequent recommendation to run the JVM in -client mode if possible. The -server VM also allocates way more memory on startup)
The damning point against Scala there is that the Scala program is only about 30% the length of Java.
That's not enough savings to make up for the complexity of a language that seems like an unholy union of Perl, C++ and F#.
For me the breakthrough in Scala was understanding enough about how it works that I could interface with Scala code easily in Java. At that point I could copy any of those patterns in Java. They were a little more verbose than Scala, but more reliable because I wasn't worrying that some piece of magic was going to cause something extraordinary to happen.
The Author states that the program was so long because he was new to the language when he wrote it. Scala can be very very terse if you know what you are doing with it. I have had projects that went from 1K+ lines of c++ to 300 lines of Scala to 10 lines of Scala. The language didn't change to give me that decrease in LOC, I did. The more I moved from the translation of C++ to a functional design in the program the better i was able to express the code.
As someone who's done a lot of functional and OO programing over the years, I find it very hard to believe that you have multiple projects that went from 1K+ lines of C++ to 10 lines of Scala.
Claims like this are the sort of thing that make make it hard to convince people to try functional programming, because they think that FP evangelists are full of shit.
In C++ I implemented a lot of functionality by hand using mostly glibc and stl. In scala there are library's to help with threading and network I/O in my case it was A combination of threading (Akka), Network I/O (again Akka) and HTTPS communication (dispatch-http). So yes this was a combination of a better programing style and better support for libraries (SBT)
A little more verbose? Lets see, Scala vs Java in a promise-based async server:
Java (real code - this is a mild example, imagine this 8 indents deep):
return users.verifyAccess(userId, oldPassword).flatMap(
new Callable1<Promise<DataObject>, UserProfile>() {
@Override
public Promise<DataObject> call(UserProfile profile) {
return replyProfile(profile.setPassword(newPassword));
}
});
Scala (translated):
for {
profile <- users.verifyAccess(userId, oldPassword)
} replyProfile(profile.setPassword(newPassword))
(observe how you can actually read the code)
Oh man, I wish we started with Scala (or wrote everything sync)... My conclusion from all of this was that Java makes certain abstractions impractical from a readability standpoint.
From my - some years, but rather limited - experience with Scala (compared to a decade of Java), the largest point for Scala is how it works with monads e.g. Option, eliminating a lot of "if" statements and making APIs much clearer. And I did not expect this to be the main difference to Java when I've started working with Scala some years ago, I thought it would be closures (which are a nice thing, I especially like the _ syntax).
Don't you have to have closures to make monads work in practice? (I only really have experience with monads in Haskell, and something like (a >>= \x -> b >>= \y -> g x y) is a very basic thing to do with monads, there.)
I like what Clojure is trying to do, but comparing languages based purely on LOC is crazy.
Its almost like choosing one car over another because the first one looks better even if it does not have an engine.
At the end of the day, languages help you write code. Comparisons on how much time it would take to write the code would be much closer to usefulness and would be interesting (even though they would have also have their own challenges).
This would be a really hard test to do in the real world, but I would like to see the code generated by 2 teams over a year - each having 4-5 people and each having similar experiences (ideally 10 years of coding background).
This is both true and false. LoC is a volume metric. More volume = more things to comprehend.
A great example of imperative vs. less-imperative is looping vs mapping.
Mapping in Python:
foo = map(function, list)
vs looping in C++:
for(int i = 0; i < list.size(); i++)
{
foo[i] = function(list[i];
}
Note that the C++ version has about 6 choice points where errors can be made (or customized too - it's more flexible than a map)- initialization, comparison, upper bound, increment, lval index, and the rval index. So to grasp the entirety of the loop, you have to examine all 6 things, whereas the map is far simpler. After a while, this simplicity starts to add up.
However, LoC is a truly bad metric (other measures are, imo, just as bad), and doesn't really capture what I just said at all. What LoC does capture (as I said above) is volume - the extrinsic complexity of the solution. If you can hone your code so that it simply expresses the intrinsic complexity of the solution, your LoC will decrease to that point. And language helps a great deal.
Your point is really well made though: for industrial use you have to have a head to head of expert teams over a reasonable time frame for a reasonably sized system, in reasonably comparable conditions. And yeah, that's the big joke in SW engineering research. It's never really been done and published in any meaningful way. Some companies did do work in that area in the '80s, as I recall - but they didn't publish meaningful datasets as far as my reading shows. Most SW engineering research in this area is very much "small problems" and "students in a class".
I disagree. It is not that simple. If volume is similar code, then sure. But if you are talking about different styles of coding then you have a lot of other things to consider. On top of this, if you have a language that has very few underlying primitive (such as parentheses in lisp as opposed to [,;.] in other languages) then you actually have a higher comprehension barrier as the reader will need have to examine the role of the primitive as well. Basically, the terser the language the harder it is comprehend a LOC in that language.
On the research note, I do agree that most SW engineering research has focused on 'small problems'.
I disagree. :-) It is that simple, at base! More stuff = more things to wrap your head around.
I posit that a well-written terse solution should say exactly the intrinsic needs of the problem at hand, and no more. The ideal language abstraction would explain exactly what needs to happen and no more. The key forms of the language (macros, functions, etc) should be built up to that point so that it is clear what is going on.
Both say (to me) exactly the same thing at approximately the same level of abstraction; one does it with Lisp syntax and one in traditional C++ syntax. The question starts to arise after a while - what abstraction tools does the language make available. At what point do you need functions, classes, closures, m4 macros, lisp macros? Well, at the point when you need them. :-) These are all just tools to say exactly what you mean in as much code as you need.
If you have a social problem of a bunch of cowboys all whooping it up in your codebase, I don't think any technical solution is going to help you. :-)
Code Complete referenced studies saying that programmers write about the same number of lines per day, no matter what the language is. It follows that more concise languages really do translate into faster development.
Thanks for the pointer. I appreciate it. I am going to try digging up the reference.
I find it hard to believe though. Beyond the code reading challenges that developers have had, I have seen highly varying productivity in code bases that have different amounts of 'brittleness' (and code cycles). On some codebases developers even after understanding what is going on and understanding the change that needs to happen, developers can very easily spend hours making sure a line of code will actually work in different situations.
Idiomatic clojure does several things to improve matters, like minimizing mutable state, wrapping the remaining mutable state in good concurrency structures, and emphasizing constructs like map and reduce instead of manually looping.
[1] http://codemonkeyism.com/comparing-java-and-python-is-java-1...
[2] http://www.oreillynet.com/ruby/blog/2007/09/7_reasons_i_swit...