Hacker News new | past | comments | ask | show | jobs | submit login
Programmer Gore (jgc.org)
51 points by jgrahamc on July 15, 2010 | hide | past | favorite | 15 comments



Sometimes, usually when in a hurry I skip the 'getting to the root cause' step and this has bitten me badly on more than one occasion.

So now, if I can afford it I really want to know where things went wrong. Usually that means a longer time-to-fix but what's fixed that way usually stays fixed. The 'band-aid' type fixes tend to lead to subtler problems that are harder to fix later on.

If it breaks, let it please break now and in as spectacular a fashion, without any band-aids, that way we can stay away from the kind of bugs that only happen during full moon and Eastern wind.

Which reminds me, I really should have a look at the guts of some software that currently runs in a wrapper script because it crashes once every 3 months or so without any apparent cause.


There's some good advice in chapter 8 of Code Complete on Defensive programming. One example given there is the C function realloc, which resizes a block of memory, which can sometimes mean moving the whole block to a new, larger block. Since intermittent bugs are indeed the worst kind, Steve suggests making the debug compile memory allocator always move the block, so as to exercise that code path everywhere in testing.

Edit: Wrong Steve and wrong book -- it's a running example in "Writing Solid Code" by Steve Maguire.


That's a good trick, regardless of which book it came from.

In the software I wrote about above I suspect a very subtle resource leak.

A nice example of such a leak is for instance forgetting to close an opened file descriptor if some other rare error condition occurs elsewhere in the code (not that that's it, but that's how you can get to the point where something will run for months on end without crashing and then suddenly it does).

File descriptor leaks can be relatively easily traced using lsof by the way (one of the step-child utilities that really should be in every coders toolbox, right next to gprof and make).


Good tip; I believe Guard Malloc does this for you.


The ITIL framework separates the duties of finding an immediate workaround and diagnosing the root cause of one or more incidents. I can see some value in not letting those two tasks flow into each other without thinking about it.


Finding the root cause of failure is essential, especially when you are working with large codebases.

For my biggest running project, I customize about 1GB of source code not written by me. Every bug needs to be chased until one actually understands why it happened otherwise it's too risky to just make a patch that "seems" to fix it.

Plus, in the process you usually learn a new and interesting thing about a previously unknown part of the codebase.

Of course, very few customers actually understand the importance of this and have the budget to allow you this "luxury".

Fixing the bug for most of the servers and leaving a small fraction with the old codebase to investigate the bug some more sound interesting but I couldn't be doing that.


A gigabyte of source code!? That's the entire depot not just one tag, right?


Actually, it's just about 800MB of source code with 150MB of external dependencies. I never counted the lines of code until today but apparently just for the main language there are over 9 MLOC.

Of course, this doesn't actually matter as I don't touch most of the code and the "recently active" part seems to measure only about 275 KLOC. But when a bug arrives it might take me anywhere; even so, I don't think I've visited more than 35% of the codebase.


The Linux kernel currently is around 450MB of source code. I've worked on individual projects with 1-2MLOC, I suspect that's about 100MB of code. A couple of those don't seem all that implausible.


Hmm

    yulia:/usr/src# bzcat linux-source-2.6.32.tar.bz2 |wc -c
    382382080

382M. Tho' is it really one "thing"? A lot of that is device drivers, loadable modules etc that no single install will ever have.


I guess my 450MB are from 2.6.34 and are the disk usage of the actual untarred sources that I'm working from, so my figure includes cluster off-cuts (the bytes needed to round up to the nearest 4096 for each file).

It's certainly one thing in that it all uses the same build script; and any of the pieces are pretty worthless on their own.


Is there a bash.org-style site for programmer horror stories? They pop up once in a while on HN, but it would be nice to have a single-service site for these kinds of stories.



Great article, but for someone reason when I saw the headline I thought it was going to be able Al Gore.


I thought this was going to be about how Al Gore programs his own climate change models.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: