Hacker News new | past | comments | ask | show | jobs | submit login

That's because no one even knows how to start doing "science" in this direction.

When you read the code there are so many different things influencing your understanding that it's hard to impossible to even list them all. And if you take a look at some of the things that might influence your understanding you'll notice that most of them are very hard to impossible to measure.

From the top of my head, things which may have an impact:

    * your level of skill in a given language
    * your level of familiarity with the style of a particular programmer who wrote the code
    * the tools you have at your disposal (go to definition, see docs functions of IDEs)
    * your familiarity with a particular framework used
    * your preference and expectations regarding the identifiers
    * your knowledge of what the system as a whole (or its part) is supposed to do
    * your familiarity with the project structure
And so on, and that's even before we start talking about concrete examples of readable code and trying to get some metrics on it!

Writing code is no more susceptible to scientific analysis than writing prose when it comes to other people reading the code (and not machines executing it). To write good code you need to first assume something about your readers (their level of skill, prior experiences, etc.) and then optimize the form of the code so that it doesn't confuse them (too short) or bore them (too long).

Seriously, writing prose and code (the latter only if meant for human consumption) is very similar: you need structure, things following one another, sentences of appropriate length and "density" and so on in both kinds of writing. Programmers could learn a lot from writers, but they most often refuse to do so. Literate Programming should be the default by now, yet is still used very rarely...




> That's because no one even knows how to start doing "science" in this direction

I am sorry to disagree, but this is just not true.

http://www.ptidej.net/courses/inf6306/fall10/slides/course8/...

http://www.cs.kent.edu/~jmaletic/papers/EMSE12.pdf

https://link.springer.com/journal/10664

https://scholar.google.com/citations?view_op=view_citation&h...

I have to say this:

https://brains-on-code.github.io/shorter-identifier-names.pd...

My supervisor has to say this:

http://pi.informatik.uni-siegen.de/stt/34_2/01_Fachgruppenbe...

These are just the ones from the top of my head that I can google quickly, if you want I would be happy to share my zotero database or a large bibtex file.

> When you read the code there are so many different things influencing your understanding that it's hard to impossible to even list them all.

You are completely right: Psychological research on programming shows that it is a very complex cognitive task, best done by experts, and poorly understood.

You describe knowledge and experience, and they in fact matter a lot. The above studies, for example, show (often just as a sideeffect), that experts are impacted less severely by badly written code (however that was operationalized).

> Literate Programming should be the default by now

I agree.


There's nothing to be sorry about, actually, I'm really happy to learn that such research is being done! I tried searching for similar studies quite a few years back and came back empty-handed, so I assumed it's either not done at all or is very niche.

As you seem to be knowledgeable about the field, how relevant/applicable you think the studies you linked to are in the general case? In the study about identifier length, for example, seems to be very specific and I'm not convinced at all the results would be the same in a different language, with different people and even with slightly different identifiers (abbreviating start to str vs. beginning to beg, for example).

EDIT: another thought on the study: does it control for presence or absence of widely known conventions? For example in Haskell, OCaml and others it's customary to write `x :: xs` - would writing `element :: list` instead improve the time needed to comprehend the code? On the other hand, in Smalltalk, you frequently write `add: aNumber to: aList` - the identifiers are longer, but they provide additional (type) information which is otherwise not present. So how long the identifiers need to be may depend heavily on the language (the study used C# I think), is it accounted for in the paper?

Still, all the papers you mentioned look interesting and I will read them once I have some time. Thanks for posting! :)


klibertp,

> another thought on the study: does it control for presence or absence of widely known conventions?

I am very happy to encounter other critical thinkers - your question is a really good one :) You are right, the study is not capable of explaining this effect (that is, how commonplace / conventional some abbreviations are), but it was considered in the design. I am sure that this plays an effect but I wouldn't dare to give a definitive answer based on the data from my study.

For example, config or cfg are arguably so common that there they don't hurt comprehension. Similar for single letter variables. Point.x and point.y are easily identifyable as coordinates. Or the variable name i in a for loop may not be problematic, as it becomes almost meta-syntactic (much like foo and bar). However, i,j,k,l index names may really hurt comprehension, when you have a complicated looping strucutre with many lines in between, as they are likely to strain your working memory. As for the point.x example: I would explain this as a priming effect. The name of x is fine, because point already preactivates the right direction. X in isolation might be worse, and if you encounter new MessageBrokerInstance().X() you might as well read your code in base64... Thus, based on my experiment, I can talk about variables in isolation, but usually, code is mixed and here, other effects might be relevant.

In the longer versions of my experiment, I considered the effect of common abbreviations as well. Psychology lists several word frequency effects. Common words can be immediately accessed *(from the so called mental lexicon, a mind-dictionary if you will), but uncommon words have to be synthesized on the fly though their phonetics (see, for example the dual route cascade model, coltheart 2001, http://www.cogsci.mq.edu.au/~ssaunder/files/DRC-PsychReview2...). Thus, high-frequency words (=often occuring, common words or strings) are quickly read and their meaning is understood, whereas uncommon words or strings do not have a representation in the mental lexicon and you have to synthesize their meaning first, thus slowing down comprehension.

My argument is simple: It is always possible to understand code, no matter how mangeled or obfuscated it is (after all, reverse engineers are doing amazingly hard work). The question is how easyly the code can be comprehended. Abbreviations that are common to some (e.g. experts), may not be common to others (e.g. novices in their first job). Of course, the newbies will get there eventually, but abbreviations have a higher learning curve, thus new people will be unproductive for a longer time.

Think about yourself, you surely know this effect:

1. Write code. 2. Problem solved 3. don't touch it for 4 months 4. Changes needed, need to fix bug, add feature 5. How does this work? 6. Wtf, what was I thinking?

For the sake of all newbies, your company, or even your own, I encourage the use of identifiers that can be read, because you can READ and know LANGUAGE, and not because of arbitrary conventions. There are many conventions (e.g. x:xs, for i=0;i<10;i++, point.x) that can surely be considered domain language and don't impede comprehension, but still might hinder comprehension for novices, or yourself in 4 months.

> how relevant/applicable you think the studies you linked to are in the general case?

This is really hard to say. Many processes take place when programming, and many programmers have theories about why it is hard and how to make it easier (as the entry article citing cognitive load, which is a good methaphor, imho). So far, I know of many such scientists who are trying to isolate the different effects. For example, I am focused on identifier names, as I find them to be impactful. Their meaning can't be analyzed automatically (even with sound nlp techniques which are relatively limited), and the programmer is totally free to name their variable names what ever the hell they want. I am sure that in comprehension of programs, identifier names play a big role, but when I encounter "clever code", with weird recursions, counterintuitive measures, or plain magic (https://en.wikipedia.org/wiki/Fast_inverse_square_root) the value of identifiers are limited, or, in other words, there are other things going on that impact my comprehension BESIDES identifiers. How they interact, I cannot say for sure, but if complex code has no clear identifiers, it becomes complicated.

I believe that each of the effects in isolation is relevant, but I am not sure which one is the most dominant, or, for that matter, whether there is ONE thing that will solve all problems.


"That's because no one even knows how to start doing "science" in this direction."

No true, there is an entire field of usability science. It is all still limited to experimental learnings as opposed to theoretical deterministic knowns.

Currently usability testing is only be applied to end users. There is no reason task analysis and the Jakob Neilson's 10 heuristic rules of basic usability cannot be applied to software itself. I apply the science of usability to written software.


Yes, but how many "usability studies" are done for writing software? I've seen onle a few and people write articles like this without any proofs daily.


Science is hard, especially if done properly. The least they could do is to spit teams in two groups, based on whether they use a technique or not (e.g. unit testing, using OOP) and see if it makes any difference. It's far from perfect, but better than nothing.

> That's because no one even knows how to start doing "science" in this direction.

I suspect the reason almost no one do SW science is that nobody really cares. Most people just repeat some dogmas they like, e.g. they say: "you broke liskov substitution principle, that's bad" without any proof.

I guess people just like flamewars (who doesn't :)


As my other comments outline, there is some science, although you are right in that empirical work on software engineering is a very small niche. Most people just report studies like:

"Hey, look what I cooked up in my basement using haskell".

- which to me, is engineering, not science.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: