Literate Programming Matters

jashkenas · on Feb 10, 2012

Direct link to the discussed Cafe au Life, for those that haven't already seen it: http://raganwald.github.com/cafeaulife/docs/cafeaulife.html

And in terms of the gist of Reg's essay, I think the single line that best cuts to the heart of it is: "[...] while David presents the concepts of literate programming and elegant programming as a dichotomy, I think they're orthogonal." Bingo.

@raganwald: I'd be curious to hear more about what sort of "literate programming tool that transforms the source directly" you were hankering for. Not CWEB style?

swannodette · on Feb 10, 2012

I didn't actually quite present them as a dichotomy. My point was a bit subtler than that in my opinion. In fact it was less about the practice of literate programming than problems of context in many programming languages.

An aspect I didn't discuss is that when we write in English the usefulness of the text is then subject to the ability of the author to write clear, logical, concise (hopefully beautiful & elegant) English. So it's not surprising, to me at least, that the best literate programs I've read are not online but in books.

raganwald · on Feb 10, 2012

I’m not entirely sure. But just looking at Cafe au Life, I’d like to be able to look at the “tangled” code where all of the code for a Square is in the Square class, and a ‘cross-cutting-concern’ view where all the code for something like import and export is kept together, whether it applies to Cell or Square.

Comes down to traditional OOP being a tree formed by belongs-to relationships between entities and responsibilities, but Cafe au Life having a many-to-many relationship.

p.s. What are your thoughts on Cafe au Life? Is this (In Your Humble Opinion) idiomatic CoffeeScript? Is this how you envisioned Docco being used? Feedback most welcome...

mhd · on Feb 10, 2012

One thing I really like about literate programming even beyond the extensive "write about what you code" aspect is the availability of sub-procedural refinements. Quite often you write code where everything in a "paragraph" pertains to a specific matter, often preceded by a comment line saying so (e.g. "// Initializing database"). With literate tools I can just move that to a separate block and have a better overview. As this is mostly out of a poor need for visual organization, factoring everything out into small sub-routines and then taking care of scoping and passing needed parameters often seems like a crutch to me (especially in procedural code). Folding editors provide another option for this, of course. (Well, you also could use gotos. But no one ever did that. Especially not me. No, officer, never.)

If I remember correctly, I once had two C preprocessors that did just that, without the rest of the literate programming features. (Found a paper for one of these: http://page.mi.fu-berlin.de/prechelt/Biblio/refinement.pdf)

joshwa · on Feb 11, 2012

Rant: Github and Google Plus are not blog platforms!

Here, even when I have the userscript[0] to show the full domain names so I know when something is being posted by a user of a company instead of the company itself, it still just shows "github.com" leading me to think it's Github and crew talking about literate programming.

(I'll admit, I saw it was raganwald and that gave it away, but still).

[0] https://github.com/johngibb/Hacker-News--Show-Subdomains

chj · on Feb 11, 2012

i am always thinking how to make my code readable, if not fun as novel, at least it should read like a prose telling stories. For those lit programming solutions on the web, nothing feels right. For me, it is ugly to see the comments when you are working with the code daily, especially when debugging. at last, i decide to keep a file for document everything, arch, bugs, decisions, stories,... The document grows over time, and then i will try my best to keep the code shorter. of course there are inlines comments, but most just little sweet ones. if you are going to print the code on paper, mass comments is not going to help.

javadyan · on Feb 10, 2012

I do wonder how hard it is to debug literate programs

almost · on Feb 10, 2012

Why would it be harder?

raganwald · on Feb 10, 2012

(Revised):

As presented in the essay, an organization of code for the purpose of explaining the code to a new programmer might differ from an organization for the purpose of maintaining the program by people familiar with its design.

The premise of the original Literate Programming was to use meta-annotations to write documentation that showed the code organized for explanations, while leaving the original in a form suitable for the machine and/or for experienced programmers.

Lacking this tool, if we use techniques like AOP to reorganize the program for explanation, we might be making things more difficult for the experienced programmer, who does not find all of the methods for a square in one place in the Square class’ definition.

pjscott · on Feb 10, 2012

One approach is to structure your program in reasonably-sized modules, as usual, but to write parts of your program in a literate programming style. If a particular module is algorithmically tricky, for example, I've found that it often helps to write down very narrative-style comments, and try to organize both English narrative and code for easy reading. This helps make the code better, and for some reason seems to make difficult algorithms easier to keep track of while I'm writing them. I've written some of my best code this way.

Literate programming doesn't need to be all-or-nothing!

hesitz · on Feb 10, 2012

[edit: Not sure why the downvote, since I was just trying to clarify concepts. "Literate programming" was invented and clearly defined by Don Knuth, and it muddies the water to suggest that merely doing good comments is a "literate programming style". Good commenting should be part of every programming "style", and literate programming does not primarily focus on "good commenting", focus is on two concepts described in Wikipedia article below, (1) tangling" of a primary source into machine-compilable form and (2) "weaving" of a primary source into print-formatted form suitable for human understanding.]

Merely having "English narrative" does not get you to literate programming. You can call it a "literate programming style", which is fine, but it's important to understand what true "literate programming" actually requires:

'Literate programming tools are used to obtain two representations from a literate source file: one suitable for further compilation or execution by a computer, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source.' http://en.wikipedia.org/wiki/Literate_programming

Tools like Javadoc are related to the second part of literate programming above, and allow for creating some formatted documentation from source code. ( http://en.wikipedia.org/wiki/Javadoc ) They don't get you all of the second part of a literate "woven" program, though, which includes text with all of the source of a program in a format suitable to be read like a book or literary essay.

Tools like Javadoc also have absolutely nothing to do with the first idea in literate programming (above), which is to have an ultimate source document where code is organized and presented in manner best suitable for human understanding, not in manner that's tailored for machine compilation (e.g., for machine-compilation the code may need to be separated into different units or files, whereas in literate source that would not be done unless it were an aid to understanding).

All that is not to say that trying to be a little more "literate" with comments in regular source code is not a good thing. But it's important to understand that "literate programming" is a clearly defined practice that requires much more than good-quality commenting.

raganwald · on Feb 10, 2012

I tried to achieve this, only lacking a true LP tool, I used meta-programming to ‘retangle’ the code I’d teased apart.

hesitz · on Feb 10, 2012

I think the better systems of literate programming allow you to change code in either place. I.e., you can edit the "literate code", which is "tangled" to create a directly compilable codebase. Or you can edit the "tangled" codebase, and have changes there be "untangled" to the literate form of the project.

I'm not sure whether this is an issue merely for the new versus the experienced programmer. It is an issue I see as even more important to debugging, where all the debugging tools a programmer uses are geared towards working with the compilable codebase, not the literate one. To make things work smoothly you need to be able to edit the compilable codebase and have changes be reflected in its untangled (i.e., literate) form.

daly · on Feb 10, 2012

Literate programming can be done in any language. Here is an example in HTML: http://axiom-developer.org/axiom-website/litprog.html

In addition to the benefits mentioned there are at least 3 others I can site.

First, since you spend time explaining why you are writing the code and your thoughts on the design and implementation you naturally discover edge cases, missing cases, and bugs. As a result, the quality of the code is higher.

Second, if you program in a team that does code reviews, the team can see your approach to the solution and the reasons. They can critique your work at a more profound level. If the code review happens before accepting the change commit, the quality of the code is higher.

Third, code lives. Sourceforge is a gravesite of hundreds of thousands of programs that have died because the authors are no longer maintaining the code. New users are confronted with a source tree of tiny files which they are unable to understand and therefore unable to modify and maintain.

I have the "hawaii test" criteria. If you can hire a new developer, give them your program, send them on a 2 week, all expense paid vacation to hawaii, and when they return they can modify and maintain the program as well as the original authors, then you have a fully literate program.

As for the question of debugging, I find that it is no different. Generally the source of bugs are the same (e.g. using copy/paste and failing to properly fix the copy).

Literate development, for me, takes on two different styles depending on the language.

In a language that allows a read-eval-print loop, like lisp, it is trivial to work in emacs with a command line in one buffer and the literate sources in another buffer. You just point at the changed code and evaluate it immediately in the other buffer. It is very productive.

In a language like Java that is a pure compile environment I create a makefile that extracts the code from the literate document into the proper com.foo.baz.... source tree, compiles the code, and runs the test regression suite. This generally takes less than a minute or two for most reasonably sized Java programs. So I make a small set of changes, save the buffer, run 'make', and see if the tests pass. For TDD programming I find this works very well.

My last large program was 60000 lines of Lisp in a latex literate file. Tex'ing the file with all of the literate documentation generated 6000 pages. The technology scales quite well.

Learning literate programming is like learning lisp. You keep wondering why anyone would program this way until you suddenly "get it". Once that happens you wonder why anyone could program any other way.

Tim Daly

aiscott · on Feb 10, 2012

This is one of the better examples of "literate" programming I have seen. I do have a couple of criticisms though.

The first criticism is perhaps more of literate programming, the concept, than this example. I personally find it difficult to read when each line of code is disjointed by comments. I guess I prefer the chunk size to be larger; a coarser granularity.

Some of these comments really seem unnecessary. for example:

  # Export `Cell`
  _.defaults exports, {Cell}

I think that is obvious enough that the code is "exporting" 'Cell'. I couldn't tell you why though.

My second criticism is that it seems comments are all too often of the "what" variety. Simply translating the code to english. That's not really very helpful. Once someone has some grasp of the programming language being used, the "what" is right there in the programming language. No need to restate it in another language.

What is helpful is the "why" of a chunk of code. I can read plain as day what it is doing. But why is the code doing that? Why was it written? Why is it necessary to do this particular thing? To me, at least, that seems much more helpful.

I feel my commenting has gotten much better since I started paying attention to when I was writing a "what" comment, caught myself, and wrote a "why" comment instead.

blktiger · on Feb 10, 2012

I've noticed a lot of people do the exact same thing with Powerpoint presentations. Each slide tells the audience exactly what you are going to say and so there is really no point for you to be there. Good presenters put things on their slides that are in addition to what they say so that the slides tend not to make sense without the presenter.

cynwoody · on Feb 11, 2012

But what if you are trying to reach a larger audience that is not going to see your presentation (either because it's not online or they don't have the time to watch a video) but might flip through your deck? It needs to be possible to get a worthwhile takeaway just from viewing the slides.

bo1024 · on Feb 11, 2012

I was wondering recently if it is possible to create an annotated version of a pdf presentation. Two purposes -- the presenter uses it as notes while they talk, and when the slides get put up, people who read them get to read the annotations too. But in the presentation itself, you only see the slides, not the annotations.

raganwald · on Feb 10, 2012

Thank you: https://github.com/raganwald/cafeaulife/issues/20

psykotic · on Feb 10, 2012

> I personally find it difficult to read when each line of code is disjointed by comments

I agree, but if you read any of Knuth's literate programs, they are not at all line-by-line commentaries.