The Ball-of-Mud Transition, or how software gets complex

ColinWright · on Oct 14, 2013

Picking up on adrianN's comment[0], when you have a collection of nodes and start connecting them at random, initially they are all disconnected (obviously) and any two that you pick are likely not to have any edges. This in the early stages, your graph is isolated nodes and isolated edges.

After a while, by chance, you happen to join an existing edge to a node. That component now has three vertices, and is 50% more likely to be chosen at random than the isolated edges.

There comes a point where you join two non-trivial components, and before long you reach a tipping point. Suddenly nearly every node you choose already belongs to a component, and that component starts vacuuming up everything.

Thus we have the emergence of "The Giant Component". This transition is sharp and well-studied. Whether you think of it as "obvious" depends on how much you study these things. I seem to recall that there is a major result that says that all first-order predicates have these threshold emergence properties, but it's been too long (30 years) since I studied this, and I could be wrong. I may be able to find some references if people really want me to.

[0] https://news.ycombinator.com/item?id=6546978

gavinpc · on Oct 14, 2013

This is very close to the "percolation problem." [0]

According to Robert Sedgewick, this particular problem has no known mathematical solution, and the threshold (for a given N) is only obtained through, e.g. a Monte Carlo simulations where you randomly open sites until the grid percolates (akin to the adding of threads). The whole thing is a good application of the union find algorithm.

The threshhold for N > 2 is about 60%. Not sure how that applies to software complexity, but it's interesting to think about.

Thanks, Coursera!

[0] http://en.wikipedia.org/wiki/Percolation_threshold

joe_the_user · on Oct 15, 2013

You mean one particular percolation problem, I assume.

Just to clarify because one might interpret your statement to mean that percolation problems in general don't have exact formulas for their solution but the "exact formula" section in your link would say otherwise.

selimthegrim · on Oct 14, 2013

For general graphs, the situation is indeed poor. For lattices, much more is understood, cf. http://en.wikipedia.org/wiki/Stanislav_Smirnov

vjoel · on Oct 14, 2013

Transitive closure and connected components are not expressible in first-order logic, but maybe you are thinking of Fagin's 0-1 law for finite relational models[1]:

For a given first-order sentence s, as n -> infinity, the fraction of models of cardinality n that satisfy s approaches either 0 or 1.

[1] http://researcher.watson.ibm.com/researcher/files/us-fagin/t...

curveship · on Oct 14, 2013

This is a neat thought experiment, but it seems to me it's looking at a different part of the curve than software complexity. The "phase transition" the author discusses occurs between 0 and 1 threads/button. In a software project, you wouldn't bring in a new component unless it served some use to the existing pieces, so software projects start at 1 "thread/button," and unless you've got orphaned code, the "cluster size" is always 100%.

Software complexity strikes me as a graph-coverage problem: given a graph of N vertices and M paths (i.e. software components and dependencies), how many vertices and paths do we need to traverse (i.e. understand) in order to make a change to component X? How does that parameter scale with different forms of graph -- linear, n-ary tree, DAG, cyclic (yikes!)?

Or is there a homomorphism between the two problems?

pbw · on Oct 14, 2013

What he doesn't emphasize is the directionality of this phase transition. It's really easy to add one more string, but staring at the resulting button-thread agglomeration it's very difficult to know what string to cut. This is why there is a never ending stream of new projects started to solve the same problems over and over. Their authors covet the opportunity to make progress during the honeymoon period, before the Sisyphean battle against software entropy sets in.

mtwestra · on Oct 14, 2013

I agree completely - once you're on the wrong side of the transition, it is hard to go back

beat · on Oct 14, 2013

More importantly, it's not worth going back. At a certain point, it becomes actually less work to start over than to try to cut that Gordian Knot.

The problem then is when business logic is encapsulated in the mud.

adrianN · on Oct 14, 2013

What he studied experimentally with the buttons and threads is know in graph theory as the "Giant component" threshold and is exactly known.

https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_...

ColinWright · on Oct 14, 2013

That's true, but I'm unaware of much work being done on the error bounds for small N (for some concept of small). I started on this during my PhD, but rapidly moved onto other problems that seemed more tractable, and never really returned to it. The results of Bollobás, Erdős, Rényi, and others, are mostly asymptotic. They do, however, seem remarkably good even on graphs of small size (under 10^6 vertices).

JulianMorrison · on Oct 14, 2013

Those names make me wonder if mathematicians are arranged in series - Erdos, Erdós, Erdős, ...

mtwestra · on Oct 14, 2013

That's very interesting adrianN, thanks!

6ren · on Oct 14, 2013

The following is a tangent.

When designing something, there are often many choices. If they interact, it quickly becomes intractable. It's tempting to try to keep them in mind, and work out the answer, but with exponentially increasing complexity, your limits are quickly reached (no matter how smart you are). Enhancing your intelligence, e.g. by offloading information onto paper, also has limits.

One solution is the scientific experiment: hold all variables constant, and see the effect of changing just one design choice. Holding them constant means you have made a design choice for that aspect that is almost certainly not optimal.

Ideally, you can do what is suggested in the article - create modules that are largely independent, and experiment within one module in isolation. Because there are fewer variables per module, they are less complex, and it takes fewer experiments to understand how each works.

The deep problem with this is if you don't know what those modules would be - i.e. you don't know which aspects are independent because that's the very thing you're trying to find out! Of course, you can probably have a guess, and certainly use your initial experiments to check those guesses, and maybe with the information gained, improve your guesses.

EDIT a specification is a module, in that it separates out some design choices.

LukeShu · on Oct 14, 2013

This isn't about what he is saying, but how he is saying it:

This bit bothered me: "Wikipedia does a great job of explaining it:" then has a quote from an actual source, that happens to be block-quoted on the Wikipedia page. If the part you quote was directly said by Brian Foote and Joseph Yoder, attribute it to them.

mtwestra · on Oct 14, 2013

You're right, it had escaped my attention. I have quoted them directly in the blog now. Thanks!

smoyer · on Oct 14, 2013

The attribution to (through) Wikipedia describes the term as being coined in 1997, but the original authors of that paper expanded there ideas in 1999. I reread this paper every couple of years just to make sure I'm not "guilty".

http://laputan.org/mud/

SideburnsOfDoom · on Oct 14, 2013

There is a long but good read that goes into this topic here http://blogs.msdn.com/b/karchworld_identity/archive/2011/04/...

It's about the full lifecycle of a typical software product. Particularly "Section 2.3.1: Loss of Architectural Integrity"

> code decay can be thought of as any implementation of functionality that makes the overall comprehension of the system more difficult to attain and increases the efforts required to implement future changes.

.. and makes further decay more likely

AndrewDucker · on Oct 14, 2013

This sounds like "Broken Windows Syndrome".

Once a system gets to the point where more than (somewhere between 30-50%) is connected together, developers stop caring about keeping it modular, because it's clear to them that their efforts are a waste of energy.

From that point onwards, the project becomes a cesspool of hackery.

WhaleBiologist · on Oct 14, 2013

From my experience code is either 'done properly' or 'shoehorned in'. But really, 'done properly' means 'you have time to organize everything satisfactorily at a high level', typically only when you are writing a new module from scratch.

Everything else is trying to shoehorn something new into an existing framework, and you don't have enough time to get it 'done properly' because your product manager has a heart attack when you tell them how long it'll take to do a proper refactoring job. This is where you will quite happily cut corners, and the chance that you'll inadvertently break existing functionality in the process increases exponentially. This is the mechanism that, in my experience, causes balls-of-mud.

And of course, no matter how good you are at planning every required use-case of your code over its lifetime at the 'done properly' stage, you can never think of it all, so at some point or another you are forced to shoehorn stuff in everywhere anyway.

nraynaud · on Oct 14, 2013

I do my best to keep in check high level stuff, like asking developers to try to delete a library if they add one, to try to "buy-back" their added lines of code in their changesets (ie. try to refactor to delete as much as they added), to get the cardboards down to the trash etc.

I think I have knack for seeing and caring about that. I see complexity arising (even if I can't always prevent it, I mean we have to ship too).

greenyoda · on Oct 14, 2013

I'm not sure that randomly connecting nodes is a good model for how software complexity arises, since connections between software components are not made at random. When was the last time you threw some dice to decide whether some piece of your user interface would be connected to business logic or directly to your database server? There's usually a (non-random) reason for why we add a piece of code, and the reason why complexity gets out of hand is that we don't, for various reasons, refactor our architectures to eliminate accumulated complexity before it's too late (and then nobody can understand it anymore, and you end up with the ball of mud).

It's also interesting to note that Foote and Yoder's "Big Ball of Mud" paper[1] portrayed the mechanisms for mud-ball formation as a set of anti-patterns. It's an interesting read. Their give some pretty thoughtful explanations for how software transmogrifies into a mud-ball, none of which include random processes.

[1] http://laputan.org/mud (1999)

nemoniac · on Oct 14, 2013

It's worth noting that Lisp programmers value the quality of that language as being a "big ball of mud" but for other reasons.

https://en.wikipedia.org/wiki/Big_ball_of_mud#In_programming...

_stuart · on Oct 14, 2013

The author is measuring largest cluster-size vs threads/button.

In any software, everything is going to be connected, otherwise there's unreachable code. So the largest cluster is always 100%, so I don't get why his argument about the sudden phase transition is relevant to software?

lesterbuck · on Oct 14, 2013

I think the answer to your question is in the missing directions on the threads. The goal of modular software is to have the arrowheads point in the right directions.

Uncle Bob was talking about this in his keynote about how Rails is not your application.

http://confreaks.com/videos/759-rubymidwest2011-keynote-arch...

http://blog.8thlight.com/uncle-bob/2012/08/13/the-clean-arch...

https://vimeo.com/21145583

aidos · on Oct 14, 2013

I was wondering about this too. I don't quite get how this random assignment of connections between components correlated to software complexity. Software isn't randomly connected (I know, sometimes you see stuff which flies in the face of this statement). And the directions of the connections are very important. You can create something that has a single component at the top connected to 100 other components. threads/buttons = 100/101 and the biggest cluster = 100%. I'd wager it would probably be a simple program to reason about. I guess I'm confused about the leap from cluster size to complexity to reason about.

mathattack · on Oct 14, 2013

So the story is software gets ugly when it gets less modular? This is a truism, no?

PaulHoule · on Oct 14, 2013

In real life I've seen many projects go wrong because people modularized it in the wrong way -- often connected with a naïve faith in "encapsulation" (Complex bugs, performance problems, and crackers don't respect encapsulation.)

For instance, SOA has had a new lease on life lately, for good reasons. I picked up a system that had four layers involved with doing a request; each of these layers had different serialization/deserialization logic (two submodules) and at least one submodule that would actually do the work.

Debugging a problem in the system often involved a wild goose chase across 12 submodules and often changing something simple (like adding a new data field) would require all 12 modules to be changed.

Even if you have a good batting average and you manage to make these changes right 90% of the time it's close to certain that making a change to that system would create a new bug.

The underlying social problem isn't that "people want to do things quickly", it's that people don't see simplicity as a virtue and don't see complexity as a problem. If they valued speed, they'd pursue simplicity because you can make changes much more quickly in a simple system.

Shish2k · on Oct 14, 2013

> you can make changes much more quickly in a simple system.

IME: Doing N features hackishly takes O(n^2) time; Doing them properly takes O(2n). The problem is when management sees a project as a series of several n=1 tasks, instead of one n=n project.

mathattack · on Oct 15, 2013

Thank you for the detailed commentary. Fighting for simplicity is rough because the virtue doesn't always have a champion, while every new complexity increasing function has one. At my last firm, every feature had management committee sponsorship. Simplicity? A dirty word.

So I see your point. Thanks for elucidating.

mtwestra · on Oct 14, 2013

Like Dexen says, the surprise is in the non-linearity. The sudden transition from a sparsely connected to a widely connected system comes somewhat unexpected (at least to me :-) ).

dexen · on Oct 14, 2013

Almost; the core insight is the non-linearity of this truism.

paganel · on Oct 14, 2013

Adding to that, there's also somth in the article about "modularity" not being the default when it comes to "nature".

> In nature, complexity is where the good things, such as life, happen.

redbad · on Oct 14, 2013

    > somth

What?