One Night With Clojure Makes a Scala Guy Humble | Agile Zone

Uchikoma · on Aug 21, 2012

Some years ago I wrote "Is Java 10x more verbose than Python (LOC)? A modest empiric approach" [1]. The most interesting comment quoted "PROGRAMMING LANGUAGES ARE LIKE GIRLFRIENDS: THE NEW ONE IS BETTER BECAUSE YOU ARE BETTER"from [2].

[1] http://codemonkeyism.com/comparing-java-and-python-is-java-1...

[2] http://www.oreillynet.com/ruby/blog/2007/09/7_reasons_i_swit...

Niten · on Aug 21, 2012

With regard to your first link, I really have to take issue with code like the following as a measure of Java and Python's relative verbosity:

    class Artist:
        def __init__(self, name):
            self.name = name
    
        def __str__(self):
            return self.name

There may be an argument to, as you say, "Never, never, never use String in Java" (or other statically typed languages), but a class like this in a duck-typed language such as Python is nonsensical.

I'm not even arguing that Python's dynamically typed approach is necessarily better overall than Java, but if you're going to evaluate the two in terms of verbosity then you really have to compare idiomatic code in both languages. Whereas if you continue writing Java in Python, of course you'll only see a 2x improvement.

Uchikoma · on Aug 21, 2012

Sorry, my Python was in the 90s, I dropped it for Java back than so my skills - even in 2008 - were not the best, as I've said in the post "Sorry that my Python is rusty, all correcting comments or comments on how to do it better are welcome."

Not sure what your argument has to do with dynamically typed references? How would one create a constructor for an object that takes an argument (name)?

" but a class like this in a duck-typed language such as Python is nonsensical."

Does this mean people do not use classes in Python (e.g. for DDD)?

masklinn · on Aug 21, 2012

No, it means people don't usually create completely useless data-holder classes in Python. Instead they use standard collections, depending on their hierarchy of needs: tuples, dicts or namedtuples.

All three of the classes in the example are basically worthless. Including SongList, which is the one with the most logic (a single line) and which you didn't even use at the end. Here's the three classes, at their most future-proof:

    from collections import namedtuple

    Song = namedtuple('Song', 'name duration artist')
    Artist = namedtuple('Artist', 'name')
    class SongList(list): pass

and as a side-note, this:

    for song in (song for song in songList if song.duration < 10):
	print "%s by %s (%d)" % (song.name, song.artist, song.duration)

is needlessly complex, why the comprehension? Just write a loop and a conditional:

    for song in songList:
        if song.duration < 10:
            print "{0.name} by {0.artist.name} ({0.duration})".format(song)

Uchikoma · on Aug 22, 2012

Ah I see the problem, the example would need to be larger for the classes to be more than data holder classes.

Do you think when the classes are larger with some logic, the ratio of Java/Python LOC would change?

masklinn · on Aug 22, 2012

It probably depends on the work being done — matrix computations likely wouldn't be significantly shorter in Python, even with numpy — but considering the java styles I know of (rather defensive and comparatively low-level) and the Python styles I also know of, as long as your python isn't java I'd say it should remain significantly shorter. Not as short as replacing 20 LOC by a single namedtuple, that's a bit extreme, but 2x is what I expect of (mostly) translating java code into python, that's the rough overhead of the language itself when saying the same thing the same way.

Uchikoma · on Aug 23, 2012

"[..] as long as your python isn't java I'd say it should remain significantly shorter."

My intent with the post was to create empiric knowledge, I'm not a follower of the idea anecdotal 'evidence' trumps all that is pervasive in our industry. So thanks for your opinion, but I'm not interested in the least in anecdotes when comparing languages.

Niten · on Aug 24, 2012

In truth, I wouldn't use anything other than a plain Python string to represent an artist's name, if that's the only thing about the artist you're interested in. But it's hard to have a meaningful conversation about such a contrived example as in that link.

To answer your question, though, yes: all other arguments about Python vs. Java aside, the LOC ratio would change dramatically in Python's favor once you start to add actual program logic. (Unless you're stuck with an architecture astronaut for a Python programmer.)

lmm · on Aug 21, 2012

As they say on another part of the internet, "pics or it didn't happen". For an example this short it would be so easy to show the code, and see how much the scala version could actually be improved.

The programming languages shootout has clojure down as a bit larger than scala for the same benchmarks: http://shootout.alioth.debian.org/u32/which-language-is-best...

stonemetal · on Aug 21, 2012

That is one thing I don't like about the shoot out, even when you don't want it to be about performance it still is. When they measure code size they only measure the best performing app. They don't have a separate set of measurements for concise code. So most of the FP entries are C in an FP skin.

igouy · on Aug 21, 2012

> When they measure code size they only measure the best performing app.

1) Not true, the code size of every program contributed is measured.

The smallest meteor-contest program shown is a Haskell program which takes almost 4 times longer than the fastest Haskell meteor-contest program.

http://shootout.alioth.debian.org/u64/performance.php?test=m...

2) True, the benchmarks game code-used comparison is between the fastest programs for each of the programming language implementations.

The size of programs written to be fast - compared to - the size of programs written to be fast.

3) True, there isn't a separate set of measurements for concise code, or least memory-used code, or most obfuscated code, or ...

You can fork a project to collect concise code data!

>> So most of the FP entries are C in an FP skin.<<

Do you think that's the only way to write fast FP code?

stonemetal · on Aug 22, 2012

Thanks for that, I didn't realize they had updated the site so much.

Do you think that's the only way to write fast FP code?

That is typically how you go fast in OCaml.

eru · on Aug 28, 2012

Haskell people are working on making idiomatic code fast. But even there, writing C in Haskell is often the way to go for pure performance.

hxa7241 · on Aug 21, 2012

The program get shorter because the author learned more and improved it each time, not mainly because of the languages.

Mainstream programming languages do not seem to vary much in lines of code of programs: the range is maybe about 2 or 3. Here is evidence: http://www.hxa.name/minilight/#comparison

This stands to reason. Look at everyday languages: they all have the same features -- they are perhaps surprisingly similar in basic structure. Control-flow, operations, data-primitives, data-compounds -- all are very similar. One of the more outlying is C: lacking common higher-level amenities like exceptions, nice data-structures, and particularly storage management, can expand code significantly.

russell · on Aug 21, 2012

These kinds of comparisons are worthless, because they are too short to show the advantages of large scale application of language specific idioms. LOC is not really a good measure of code size. The number of symbols is better. A single line of Java can be way more verbose than the same line in Python.

More than a decade ago I worked on a compiler written in C++ and did an exercise to see what the savings would be to implement it in Python. IIRC it was something like 80%. There were huge savings in the size of static data structures. A lot was due to the fact that Python lent itself to being an ad hoc DSL.

eru · on Aug 28, 2012

Do you still have the code?

drcode · on Aug 21, 2012

To be fair, the languages that I think lend themselves most to low LOC are haskell, common lisp, and clojure. None of these are covered in the minilight comparison.

eru · on Aug 28, 2012

Also, K.

DennisP · on Aug 21, 2012

Your list misses some pretty significant features that don't exist in Java, including lexical closures, list comprehensions, and macros.

Try spending some time on 4Clojure. For a while I was routinely solving problems in half a dozen lines or more, only to find that other people had done them in one line.

Also of course, Paul Graham has said a few things about this, such as: http://www.paulgraham.com/avg.html

seertaak · on Aug 21, 2012

A 10x reduction from Java->Clojure is IMHO suspicious. A ~2-3x reduction I would (just about) believe.

It's possible that some of the improvement came from having solved the same problem for the third time.

ajuc · on Aug 21, 2012

He probably just used standard language constructs (hasmaps, vectors, sets) instead of custom classes like in Java or Scala. In Clojure it's "idiomatic" to do that, and I believe it can make 10x difference in short programs.

Yeah, some of the improvement could be because of better understanding of the problem.

yummyfajitas · on Aug 21, 2012

Using standard language constructs like lists and maps is idiomatic in Scala as well:

     val myMap = Map("foo" -> 1, "bar" -> 2)

An idiomatic way to make a map from the columns of a csv file:

     val myMap = fromFile("filename")
                    .map( x => x.split(','))
                    .map( x => (x._1, x._2) ).toMap

I've written plenty of lisp (Clojure and non-Clojure) and I haven't observed it to be significantly shorter than Scala/Haskell.

Scala has a two-pronged nature. It can be a superior Java or an inferior Haskell. I suspect the author of this post was treating it as the former.

ericssmith · on Aug 21, 2012

"I've written plenty of lisp (Clojure and non-Clojure) and I haven't observed it to be significantly shorter than Scala/Haskell."

Same here. Although I've found Haskell to be noticeably shorter than Scala. But there are trade-offs. Java reads sequentially, top-to-bottom, so when you are new to a piece of code, you trace through it in this linear fashion. And as Paul Graham says in On Lisp (p30), such code 'looks solid and blockish'. But it's approachable, if somewhat long and cluttered. On the other end of the scale is Haskell, which is anything but linear. And with a healthy dose of library-specific operators, it can be quite alien looking. Until you are accustomed to the syntactic patterns, it's pretty hard going. But after that point, it is considerably faster to grok the meaning of the program, as it is entirely clutter free. Clojure (and Lisps in general) is closer to Haskell in its non-linear nature, but without more syntactic help (eg infix operators), it never gets to the compactness of Haskell. And the indentation gets a bit heavier.

The increased LOC in Java is the price you pay for this linearity. But with program complexity and scale, this price can get to be pretty heavy. The same can be said for indiscriminate mutability and side-effects.

gruseom · on Aug 21, 2012

The increased LOC in Java is the price you pay for this linearity. But with program complexity and scale, this price can get to be pretty heavy.

It's not just that as the program gets larger, you have more of that linear code. It's that the program necessarily stops being linear. There's no linear way to lay out the logic of a sufficiently complex system. The pieces may be linear in the small, but the way they interact is not. The same is true of spaghetti.

For some reason when people talk about readable code they tend to consider just the pieces and not the interactions between them. But you can't understand a system without the latter and it's the latter where most complexity lurks (in any language; function calls are also a kind of goto).

aaronblohowiak · on Aug 23, 2012

Function calls with side-effects are goto

gruseom · on Aug 24, 2012

With or without side effects, a function call says "go to this other place and do this other thing and bring back the result". That's a kind of goto – a structured kind that's always paired with a return. This is higher level than the old-fashioned GOTO but that doesn't make it complexity-free. It's still possible to create spaghetti with it.

Maybe that sounds pedantic, but I don't think it is, because interaction complexity needs to be minimized just like other kinds of complexity do. When you call a function you are implicitly drawing a line between the caller and the callee. Imagine a picture of your program with all those lines drawn at once. That picture ought to have some order to it.

Side effects are a different kind of complexity. If pure function calls are like going camping and leaving nothing behind, then function calls with side effects are like going camping and burning a fire or littering or what have you.

masklinn · on Aug 21, 2012

Yes, that's what TFAA definitely hints at:

> the Scala number is a bit misleading: it was written before I got really familiar with Scala, and could probably be condensed to about 150-200 LOC’s

A factor of 2~3 sounds about right for the difference between "scala as a better java" and "scala as scala"

aidenn0 · on Aug 21, 2012

I don't know scala, but that CSV parser looks wrong. You will fail on any CSV file that contains commas in data items. It's the same reason why you can't use awk's field separators to parse CSV files.

djhworld · on Aug 21, 2012

My only problem with Clojure is the terrible JVM start up time.

While I agree that for typical long running applications like web servers/web applications this isn't really an issue, the problem manifests itself when you want to write small scripts or small command line applications that have similar performance to their Java equivalent.

If you write a CLI app in Java, the application will execute almost instantly as soon as you run it, but with languages like Clojure/Scala you have to wait a long time.

nl · on Aug 21, 2012

If you write a CLI app in Java, the application will execute almost instantly as soon as you run it, but with languages like Clojure/Scala you have to wait a long time.

I assume this means you are running the uncompiled version as the script? For Scala at least that is fairly uncommon in situations outside development.

(Note that to do the same in Java is very hard - you'd have to write a custom compile & execute script - and then it would perform similarly to the Scala/Clojure app anyway).

Also, I think it is great to see the startup time for a Java application praised. How things have changed!

ConstantineXVI · on Aug 21, 2012

ClojureScript is a potential way around the JVM; but there's the disadvantage of less libraries (since cljs tends to be used more in the browser), and you can't easily edit scripts in place since they have to be run through the (JVM) compiler first.

masklinn · on Aug 21, 2012

> My only problem with Clojure is the terrible JVM start up time.

Can't find it anymore, sadly, but I remember a guy had done some profiling (on leiningen maybe?) and it turns out more than 90% of the startup time was loading core.clj, not the JVM startup when only took a small fraction of the total.

(now part of clojure.core being so slow goes back to the JVM: a lot of resources are expended JITing stuff which — for leiningen — will just be thrown out. Hence the quite frequent recommendation to run the JVM in -client mode if possible. The -server VM also allocates way more memory on startup)

PaulHoule · on Aug 21, 2012

The damning point against Scala there is that the Scala program is only about 30% the length of Java.

That's not enough savings to make up for the complexity of a language that seems like an unholy union of Perl, C++ and F#.

For me the breakthrough in Scala was understanding enough about how it works that I could interface with Scala code easily in Java. At that point I could copy any of those patterns in Java. They were a little more verbose than Scala, but more reliable because I wasn't worrying that some piece of magic was going to cause something extraordinary to happen.

dkhenry · on Aug 21, 2012

The Author states that the program was so long because he was new to the language when he wrote it. Scala can be very very terse if you know what you are doing with it. I have had projects that went from 1K+ lines of c++ to 300 lines of Scala to 10 lines of Scala. The language didn't change to give me that decrease in LOC, I did. The more I moved from the translation of C++ to a functional design in the program the better i was able to express the code.

kschrader · on Aug 21, 2012

As someone who's done a lot of functional and OO programing over the years, I find it very hard to believe that you have multiple projects that went from 1K+ lines of C++ to 10 lines of Scala.

Claims like this are the sort of thing that make make it hard to convince people to try functional programming, because they think that FP evangelists are full of shit.

dkhenry · on Aug 21, 2012

In C++ I implemented a lot of functionality by hand using mostly glibc and stl. In scala there are library's to help with threading and network I/O in my case it was A combination of threading (Akka), Network I/O (again Akka) and HTTPS communication (dispatch-http). So yes this was a combination of a better programing style and better support for libraries (SBT)

jaen · on Aug 21, 2012

A little more verbose? Lets see, Scala vs Java in a promise-based async server:

Java (real code - this is a mild example, imagine this 8 indents deep):

  return users.verifyAccess(userId, oldPassword).flatMap(
    new Callable1<Promise<DataObject>, UserProfile>() {
      @Override
      public Promise<DataObject> call(UserProfile profile) {
        return replyProfile(profile.setPassword(newPassword));
     }
  });

Scala (translated):

  for {
    profile <- users.verifyAccess(userId, oldPassword)
  } replyProfile(profile.setPassword(newPassword))

(observe how you can actually read the code)

Oh man, I wish we started with Scala (or wrote everything sync)... My conclusion from all of this was that Java makes certain abstractions impractical from a readability standpoint.

Uchikoma · on Aug 21, 2012

From my - some years, but rather limited - experience with Scala (compared to a decade of Java), the largest point for Scala is how it works with monads e.g. Option, eliminating a lot of "if" statements and making APIs much clearer. And I did not expect this to be the main difference to Java when I've started working with Scala some years ago, I thought it would be closures (which are a nice thing, I especially like the _ syntax).

eru · on Aug 28, 2012

Don't you have to have closures to make monads work in practice? (I only really have experience with monads in Haskell, and something like (a >>= \x -> b >>= \y -> g x y) is a very basic thing to do with monads, there.)

taybin · on Aug 21, 2012

"Only about 30% the length of Java" seems like a huge savings. What would you prefer?

vineet · on Aug 21, 2012

I like what Clojure is trying to do, but comparing languages based purely on LOC is crazy.

Its almost like choosing one car over another because the first one looks better even if it does not have an engine.

At the end of the day, languages help you write code. Comparisons on how much time it would take to write the code would be much closer to usefulness and would be interesting (even though they would have also have their own challenges).

This would be a really hard test to do in the real world, but I would like to see the code generated by 2 teams over a year - each having 4-5 people and each having similar experiences (ideally 10 years of coding background).

pnathan · on Aug 21, 2012

This is both true and false. LoC is a volume metric. More volume = more things to comprehend.

A great example of imperative vs. less-imperative is looping vs mapping.

Mapping in Python:

    foo = map(function, list)

vs looping in C++:

   for(int i = 0; i < list.size(); i++)
   {
        foo[i] = function(list[i];
   }

Note that the C++ version has about 6 choice points where errors can be made (or customized too - it's more flexible than a map)- initialization, comparison, upper bound, increment, lval index, and the rval index. So to grasp the entirety of the loop, you have to examine all 6 things, whereas the map is far simpler. After a while, this simplicity starts to add up.

However, LoC is a truly bad metric (other measures are, imo, just as bad), and doesn't really capture what I just said at all. What LoC does capture (as I said above) is volume - the extrinsic complexity of the solution. If you can hone your code so that it simply expresses the intrinsic complexity of the solution, your LoC will decrease to that point. And language helps a great deal.

Your point is really well made though: for industrial use you have to have a head to head of expert teams over a reasonable time frame for a reasonably sized system, in reasonably comparable conditions. And yeah, that's the big joke in SW engineering research. It's never really been done and published in any meaningful way. Some companies did do work in that area in the '80s, as I recall - but they didn't publish meaningful datasets as far as my reading shows. Most SW engineering research in this area is very much "small problems" and "students in a class".

vineet · on Aug 21, 2012

> More volume = more things to comprehend

I disagree. It is not that simple. If volume is similar code, then sure. But if you are talking about different styles of coding then you have a lot of other things to consider. On top of this, if you have a language that has very few underlying primitive (such as parentheses in lisp as opposed to [,;.] in other languages) then you actually have a higher comprehension barrier as the reader will need have to examine the role of the primitive as well. Basically, the terser the language the harder it is comprehend a LOC in that language.

On the research note, I do agree that most SW engineering research has focused on 'small problems'.

pnathan · on Aug 21, 2012

I disagree. :-) It is that simple, at base! More stuff = more things to wrap your head around.

I posit that a well-written terse solution should say exactly the intrinsic needs of the problem at hand, and no more. The ideal language abstraction would explain exactly what needs to happen and no more. The key forms of the language (macros, functions, etc) should be built up to that point so that it is clear what is going on.

There is - in my mind - little difference between

    (with-open-file (file-stream filename)
       (do-stuff (read file-stream)))

and

    stuffDoer(fileObject(filename).read());

Both say (to me) exactly the same thing at approximately the same level of abstraction; one does it with Lisp syntax and one in traditional C++ syntax. The question starts to arise after a while - what abstraction tools does the language make available. At what point do you need functions, classes, closures, m4 macros, lisp macros? Well, at the point when you need them. :-) These are all just tools to say exactly what you mean in as much code as you need.

If you have a social problem of a bunch of cowboys all whooping it up in your codebase, I don't think any technical solution is going to help you. :-)

masklinn · on Aug 21, 2012

> Both say (to me) exactly the same thing at approximately the same level of abstraction

They don't, though: depending on the exact C-style language, you may have leaked a filehandle.

DennisP · on Aug 21, 2012

Code Complete referenced studies saying that programmers write about the same number of lines per day, no matter what the language is. It follows that more concise languages really do translate into faster development.

vineet · on Aug 21, 2012

Thanks for the pointer. I appreciate it. I am going to try digging up the reference.

I find it hard to believe though. Beyond the code reading challenges that developers have had, I have seen highly varying productivity in code bases that have different amounts of 'brittleness' (and code cycles). On some codebases developers even after understanding what is going on and understanding the change that needs to happen, developers can very easily spend hours making sure a line of code will actually work in different situations.

DennisP · on Aug 21, 2012

Idiomatic clojure does several things to improve matters, like minimizing mutable state, wrapping the remaining mutable state in good concurrency structures, and emphasizing constructs like map and reduce instead of manually looping.

Uchikoma · on Aug 21, 2012

1.) K [1] is shorter than Clojure, if LOC is a metric you love. 2.) Scala 8x more LOC than Clojure, I'm not believing this.

Or in other words, show some code.

[1] http://en.wikipedia.org/wiki/K_(programming_language)

eru · on Aug 28, 2012

I haven't used K myself, but on Project Euler the K people usually come up with the shortest code.