I hope everyone winds up with a few of these stories after long enough. They're great to have.
One night I came in from "lunch" (2nd shift...) and saw three guys standing around a laptop trying to make something work for a customer. They had an Apache install which was supposed to log something to a given file, but they'd poke the machine with their browser and would get nothing in the log file.
I walked by on my way to my desk and one of them grabbed me. I couldn't type anything (because one of them was already in the way), but I could see what was already on the screen. It was a chunk of the config file and it looked a little like this:
<VirtualHost 72.3.x.x:80>
# (directive to log to that file was here)
...
</VirtualHost>
I asked them to run 'ifconfig'. They did. The machine had 192.168.x.x IPs because it was sitting behind a firewall doing NAT -- a fairly common config for our customers. Apache is pretty strict about matching things when you use an explicit IP+port, so it wasn't using that virtual host to service the hit, and thus the log directive wasn't used, either.
I just said, okay, change the IP on that line from 72.3.whatever to 192.168.whatever and reload the config and try it again. They did, and it worked, so I continued back to my desk.
From what I heard, they had spent over an hour trying to figure it out. I didn't type a single character and never even sat down. I chalked it up to luck and went on with life.
I think a more interesting perspective on this article would be "there's no such thing as a quick buck, or if there is, this is not a story of it."
Sure, he made 2,500 guilders in 5 minutes, but the author says of his early QnX days: "After a couple of visits there to talk over strategies on how to solve certain problems using QnX [...]". So to me this anecdote is about how guy spent an amount (several days? months?) learning QnX, getting involved in the (1-man) community around him, and became somewhat of an expert in QnX such that when the time arose, he was the guy they called.
The way the story is told is akin to patio11 saying "an easy way to make a quick buck is consult with a BigCo!" True... but that ignores the time, effort, and experience Patrick has put into building his skills.
TL;DR: "Invest time in learning something valuable" is a better takeaway from this article, IMO.
After reading the article, even I found the title tongue in cheek. That said, GP has explicitly called out (quiet well) what he thought was the underlying lesson and I am thankful for that.
It's a bit banal though, of course you should learn something valuable. Isn't that what your mother's been telling you all these years?
I think the better lesson which is directly applicable to all coders is "question your assumptions". To me this was a huge benefit of learning ruby as it happens to be one of the easiest languages to pull your assumptions right out from under you.
You're right. And I hope my comment didn't come across as harsh. I would have said the same thing to your face, but my tone and body language would have softened it quite a bit.
When you’re looking for a bug it’s terribly hard to not get
bogged down in details at the expense of seeing the bigger
picture. An outsider doesn’t have those details so that
helps tremendously. The lesson I took away from that day is
that when I’m stuck I ask an outsider.
This pretty much sums it up and isn't limited to software engineering at all. Having chased some hardware bugs on my own I came to the same conclusion: one conversation with a not involved colleague ending in the question Well, have you checked (the obvious) X? was more productive then some hours or days of bug hunting.
On the flip side, when playing the part of the outsider, never be afraid to ask if it is plugged in (etc.). If you can shape your local culture, help ensure those sorts of questions are understood as basic due diligence and not an insult. I always half apologize, but about 1/3rd of the time, yeah, that's it. (Oops.) We all do it.
I always phrase it to the client as 'building a mental picture of the problem'. Then I can ask questions that some people might otherwise take offense to.
I've heard a teddybear works pretty well in a pinch.
For me, the epiphany usually hits when I finally break down and decide to ask for help. I realize my cognitive blind spot while composing a message for a forum post or email.
It's always good to step back and acquire a fresh perspective.
> I've heard a teddybear works pretty well in a pinch.
Ah! The ancient feud between the School of the Duck and the Monastery of the Bear! I think modern times have calmed down the waters and all's right in the world of Engineer Fu.
I just spent all night working out bugs on a project. I pushed my work two days ago and thought it was perfect. Err, no. There were bugs to fix. So I looked at it and couldn't figure it out. Nothing looked right. So I got the person I work for to go through the issues with me. One by one I was able to figure it out. The software shipped. And it ended up being so much better than before. Plus it worked like it was meant to.
I've seen the same type of scenario played out from a business perspective. "Wait, why are you spending $2,000/month on SEO? Why don't you just partner with Company X." My favorite, though, was when I showed a ticket broker friend of mine how to plug into existing affiliate markets. One quick switch and his revenue jumped 1,000% at no cost to him.
You could probably make an argument that it's for similar reasons that scientific advancements often (though I'm not sure exactly how often) come from people outside the field.
I'd expect such an argument has been made by people - perhaps someone here might know of some examples.
I think that notion may be a case of "story bias". The stories of scientific advancement coming form those outside the field are so much more interesting that they get told in vast disproportion, making it seem they are the norm.
- Woman: "It's You! Picasso. Could you please paint me?"
- Picasso: "Here you go Madam!"
- Woman: "Wow! great picture in only few seconds. How much do I owe you?"
- Picasso: "$5,000"
- Woman: "That's outrageous. It only took you few seconds"
- Picasso: "Madame, it took me my entire life!"
I flew over to see a customer (only a 2h domestic flight) to solve an 'unsolvable' problem. They'd been going through it with support for weeks with no luck. I got to the customer, took over his chair, edited a single file, added a missing ; character and it worked. Done. I got the next day and a half 'off' because my hotel/flights couldn't be rearranged for an earlier return.
The root cause was a piece of paper on his desk with a diagram of the systems and their IP addresses.
Direct shell access to the machine wasn't possible. To get a shell on the machine that ran the processes he had to go via a separate gateway host, and the connection from the gateway host to the destination machine was done by hostname (as the gateway machine had access to the internal DNS).
To get a copy of any config files he was able to FTP to the server directly from his laptop, but he wasn't able to use the same DNS servers as the gateway host so he relied upon the piece of paper on his desk which had the names and IP address of the machines. Unfortunately the IP address for the system containing this config file was the one for the corresponding dev system, not the production machine.
Every time he ran the process he was using the config file that was missing a ; character and so it spat out an error.
Every time he got a copy of the config file (via FTP) he was actually getting it from the dev system which wasn't missing the ; character.
Much back and forth from support on "are you sure that's the right config file?" and he always responded with "yes".
When I got there I looked at the config file via the shell (on the right machine obviously) and spotted the problem straight away.
I didn't make any money out of it, but I did get a day and a half to look around a new city on expenses.
[EDIT] I didn't quite walk out straight after fixing the problem. I stuck around to help out with any other queries they had with our software but they ran out of questions by lunch time.
[EDIT2] Customer wasn't being billed at all for my visit. They pay annual maintenance for the software though. If it's a serious (or weird) enough problem (or they shout loud enough) then someone is sent on site.
I was once brought in to interview as a technical lead by a turnaround artist (the company was in dire straits). During the course of the interview, they asked me about a problem they were banging their heads against for _months_ where tomcat taking a very long time to restart (on the order of 30-60m). I thought about it for a second, and once I overcame my surprise that they were using Tomcat, I asked if they had large session objects.
Years ago, I'd run into a somewhat similar issue, as Tomcat serializes its internal session representation on shutdown, so that when it comes back up, state can be restored. If one has truly gigantic session objects (which is a no-no anyway), it takes forever to serialize them, and as muster them back into memory.
They tweaked the config, the problem went away, their "chief architect" was fired, and I got the gig. Turns out this problem had them weeks away from going out of business, they'd been working on solving it for months.
The gig turned out to be a major disaster anyway, their problems were far deeper than technology.
The funny bit is that he probably said it in jest but out loud in front of four very wealthy customers so rather than cheapening out he paid up and ended up cementing our relationship in trust. Jan is an absolutely awesome guy, I learned more about telecommunications from him than from the CCITT books. His knowledge is literally encyclopedic, even today he's current on the state of the art.
Jan single-handedly designed and implemented a clustered system that would scale horizontally to insane volumes of messages (faxes and telexes) and he did that with early 80's technology.
Even today, with our current tech you'd be hard pressed to create a system that is as elegant as what he put together, he's both an awesome teacher and a good friend.
Jans desktop was for the time pretty powerful, there were quite a few servers there as well (> 30).
The machines were mostly powered by 486/33, on a micronics motherboard, adaptec controllers (1542) to hook up drives.
Using the 286 version of QnX, all the joys of 'mixed model' programming. QnX took forever to release their 32 bit version (this was the main reason I wrote a clone).
Funny that I still remember those details after all these years.
Value-based pricing in software development is a high-risk gamble though, as it can very easily go either way. The risk of course decreases with experience, but even experienced teams and individuals can burn themselves on value-based, i.e. fixed-price. If the seller overshoots their estimates, you have a regular tug-of-war, where the seller is trying their best to do as little work as possible, because every extra hour puts them further in the hole.
I find I like working on existing projects for this very reason. You can scope things into weekly blocks and emphasize the value you're adding to their business. It's easier for the customer to "See" it since it's not vaporware. It's also much easier to get a 1-week estimate right than a 3-month estimate. And even if you really really screw up, all you've lost is a Saturday. When you screw up a 3-month estimate it's more likely to cost you six weeks' billables. Even if you charge hourly, do you think the client is going to give a stellar reference for the contractor that billed $20K+ over his estimate?
Short scope, well-defined, less risk, more fun, easier sell for both parties.
So I'm with you. I would not want to jump into a 3-month fix priced contract blind.
This is a great, related story from a long while back:
In the early years of the 20th century, Charles P. Steinmetz, who stands among the electric industry's greats, was brought to GE's facilities in Schenectady, New York. GE had encountered a performance problem with one of their huge electrical generators and had been absolutely unable to correct it. Steinmetz, a genius in his understanding of electromagnetic phenomena, was brought in as a consultant -- not a very common occurrence in those days, as it would be now.
Steinmetz also found the problem difficult to diagnose, but for some days he closeted himself with the generator, its engineering drawings, paper and pencil. At the end of this period, he emerged, confident that he knew how to correct the problem.
After he departed, GE's engineers found a large "X" marked with chalk on the side of the generator casing. There also was a note instructing them to cut the casing open at that location and remove so many turns of wire from the stator. The generator would then function properly.
And indeed it did.
Steinmetz was asked what his fee would be. Having no idea in the world what was appropriate, he replied with the absolutely unheard of answer that his fee was $1000.
Stunned, the GE bureaucracy then required him to submit a formally itemized invoice.
They soon received it. It included two items:
1. Marking chalk "X" on side of generator: $1.
2. Knowing where to mark chalk "X": $999.
What I find interesting is how much benefit you get from being one of the guys who get asked for help with this kind of thing ("I can't figure it out"-problems).
It was the main way I studied in university. People would come with some "difficult" problem thinking that I'd know the answer. They'd proceed to explain the problem, teaching me a ton of stuff while doing so, and there would inevitably be some small obvious thing they'd missed, I'd point it out and they would be super happy. I'm sure I learned way more from those interactions though.
It's the same in work as well, if you're "a troubleshooter" you end up learning an amazing amount of things from people coming over to explain something so that you can help them find the (usually) obvious thing they missed, or if it's not obvious you get to do some interesting research with them which teaches you some new stuff.
Seconded.. and it's one of things i miss about college.. and guess i was looking for when i went back to do a master's. I didn't get it though.. due to a bad choice of college.. but you live and learn.. :-)
This story underscores something that I look for when I'm interviewing for certain kinds of support roles. Basically, it's really hard to teach someone proper troubleshooting. It seems that people either "get it", or they do not. You can give them a flowchart, and lots of training, but it's not the same.
Bas troubleshooters have a hard time assessing just where to start, do we look at the immediate problem as reported, or do we track back through dependencies to a non-apparent possible root? Also, in a stressful situation like this it seems that it takes a certain amount of resolve to change 1 thing at a time and have a way to (quickly!) test your results so that you can actually solve the problem, instead of just pushing out the next occurrence a few weeks.
Another common bug: bad troubleshooters don't verify their hypotheses. They get attached to their first idea of what might be going wrong, and start tweaking things to alter the behavior of whatever they think is at fault --- and after each such tweak fails, it's on to the next, without a pause to consider that the problem might be elsewhere. Time wasted like this can range from hours to weeks.
Where this gets really annoying is when you present them with direct evidence that the problem is elsewhere, and you get back an emotionally driven argument for ignoring it...
Yes, when I debug I first verify all the obvious things with quick sanity checks, and then do a binary search down to the more detailed possibilities. Just yesterday I helped my friend find an inscrutable bug. He checked the inputs (he expected "myString" and indeed got "myString") so he delved deep into the system fruitlessly for hours. When I helped, I quickly uncovered that "myString" != "myString": his input had a carriage return prepended to it!
You want to make a quick buck? Hammer out some freelance jobs on odesk.com, freelancer.com, etc.
You can spend a week's worth of evenings plus the weekend and net a good few hundred dollars.
The advice I just gave, while accurate, isn't very valuable because it doesn't scale. I've done it a few times, bought myself a nice holiday last year with the proceeds. But it's not something that grows, you have to keep grinding away to create a recurring income.
I preferred it when my websites regularly produced £60/month in adsense revenues without me having to lift a finger. (That not-lifting-finger part only lasted about a year though. Now I barely make £6/month because I've neglected my sites).
I had a fun experience in college working in a research lab. My first task was to interface a Mac with a high frequency generator using a data acquisition connection and a very specialized environment called LabView (sort of like Sketch but for research equipment). I had the HFG's manual as a reference and it was pretty straightforward, after I wrapped my mind around the fact that voltage of 0 was a logical 1. The "code" (little boxes representing various devices and virtual wires connecting them) was complete and actually pretty straightforward after a refactor or two. Except it did not work. No errors were raised but the HFG did not produce a frequency. I spent Thursday and Friday debugging it and left the lab very frustrated. This was my first project and I wanted to be taken seriously. Looking back I was probably putting too much pressure on myself about this really minor thing.
So far as I know, nobody was in the lab over the weekend and certainly nobody would have touched my workstation. When I came back Monday the code worked. It worked the next day and the day after. The only thing that changed over the weekend: daylight savings ended. I have no idea why that would have any effect on the components involved and it is a mystery to me to this day. (Yes I restarted everything numerous times prior to that which would have explained the sudden fix much more simply). The worst thing is that I don't know whether things stopped working 6 months after when daylight savings started back up.
In my kickstart scripts, I explicitly do two things:
1. Create a 1 GB file of "nothing" (e.g. dd if=/dev/zero of=/path/meh.bin bs=1000 count=1000000) on each filesystem (exception: /boot, which is typically 256 or 512 MB).
2. Configure each filesystem to reserve 5% of space for the root user. This is typically the default, but I do it explicitly just to make sure.
When the first "low disk space" warning comes in, that 5% can be adjusted down to 1% to free up some space (on today's large filesystems, this 4% usually amounts to quite a bit of space). This buys you enough time (hopefully) to expand logical volumes, etc.
If you manage to about run out of space after freeing up that 4%, you have one more chance: delete that 1 GB file.
Another favorite of mine is: looking at the right file, but on the wrong host (for example when connected via ssh over a fast network, you simply forget that you're in a remote shell).
I was working in the web interface of one of our appliances at work just before lunch. One of my coworkers (with whom I share admin duties of this appliance) set his computer to shut down then left to pick up some food. Suddenly I noticed the appliance was unresponsive! I was getting site not found errors, and I couldn't even SSH into the machine. I look at my coworkers computer and realize it's still running, but the last command he ran was a shutdown in a remote shell...
That's where molly-guard[1] comes in; stops (or at least confirms) shutdown on protected hosts. Named after the Big Red Button Cover of hacker-lore[2].
I've heard of people doing things like having a terminal background colour escape sequence in their login scripts as well, so it's immediately obvious if you're on a production host.
Given the well-established treachery of my finger muscle-memory, I'm much more comfortable with a hard interlock than something that relies on me checking the prompt. Definitely better than nothing though.
At my last job, we had a MySQL instance that kept falling over. It turns out that someone was dividing a table up with tons of partitions, and MySQL creates a new data file for each partition. The MySQL instance created several hundred file descriptors while trying to open each one of these files, and "mysteriously" crashed.
Reminded me of the situation in which I was given 'a very peculiar issue which was very hard to simulate' (acc to the description). Considering that it was related to the message queue generation module of a core banking system, and that being a production issue, the scene was intense. Almost everyone was convinced that the issue was due to a recent patch to the MQ module but couldn't figure out the reason.
On a hunch, after seeing the pattern of errors in the debug logs, I dug up the old link list handler routines in our codebase, and lo and behold, there was an OBOE [1], a very latent one that too, dating back to early 90s. The fix was just an addition of a '=' which got things back on track. That incident made me the official schroedinbug [2] mangler of the unit, causing me to spend countless night-outs thereafter.
Amusingly, this happens a lot with SSL certs. BIOS battery will be dead, internet connection will be spotty or the time server host names will be old or whatever, system will run but not have the right time. IT gets called in because nothing HTTPS will work, particularly GMail. All the certs won't verify as being signed in the future and other rubbish.
That reminds me of the story Isaac Asimov told. He'd been given a TRS-80 desktop by Radio Shack to use. It worked fine until one day. He couldn't figure out why the hell it wouldn't turn on. He spent a lot of time trying to figure it out and was getting in a panic. He called Radio Shack and they sent over a tech who found the problem instantly. The computer wasn't plugged in!
Several years ago, I called Comcast technical support because my cable Internet service stopped working. The support guy told me to unplug the Ethernet cable from both my computer and the cable modem, turn it around, and plug it back in.
Clearly he just wanted to make sure that it actually was connected properly... but I like to think he was making sure the Internet was flowing into my computer. With those unidirectional Ethernet cables you have to check.
That's a venerable trick for call center tech support. You can't ask the customer "Is it plugged in?". They will ALWAYS say yes and get mad, "do you think I'm an idiot?" But tell them to do something else with the connection in question, like "turn it around" or "blow dust out of it", and usually they'll actually do it and figure out if it wasn't plugged in.
"Please reboot your computer" serves a similar purpose. Rebooting hasn't actually fixed any network connectivity troubles since approximately Windows 95, but the action serves as proof that 1) the customer actually has a computer, 2) they are at it, and 3) it's turned on. They really will have those basic facts wrong and lie about them if directly asked. ("Their computer" could be at their nephew's house, or be a TV.)
I actually called Comcast for the same reason. We got it down to the fact that the cable modem and my wireless router weren't seeing each other, based on what she saw from her end plus what I told her I could see on my end. We decided to try a different Ethernet cable. In the process of doing that, I realized that I had been prettying up the cables earlier (a fact I had already forgotten about) and had plugged the cable back into the wrong port. Stupid mistakes. If she had me try "reversing the cable" or "blowing the dust out" it would have had the same end result. Boy, I felt dumb!
In my first job I had a coworker that always was teasing me. Once I got really angry at him. I went to his desk and loosened (not simply disconnected) his keyboard plug, while talking to him. Rest of people noticed what I was doing, but not him.
He could type for a while after I was back to my desk and before the keyboard went dead and he started to curse about a quarter of an hour of work lost. Everybody was laughing out loud, and louder because he was seeing that I had something to do with it but he can't figure out what.
I find the parallels between this and the recent HN outage to be interesting: Both times, the people responsible assumed that a recent software upgrade was the root cause, when the actual cause was something mundane and basically unrelated.
This sounds like a (perhaps apocryphal) story I've heard of Feynman when he was a "fresh" engineer in Los Alamos. There was an engineering defect in <some thing> and after a quick look at the engineering diagrams he had a solution. I think this was from "The Pleasure of finding things out" but I couldn't find the story in my copy.
The reason I think it may be apocryphal is that is bears a striking resemblance of the "knowing where to drill the hole" story told elsewhere in this thread.
One more instance of The Law Of The Too Solid Goof: In any collection of data, the figures that are obviously correct beyond all need of checking contain the errors. Corollary 1: No one you ask for help will see the error either. Corollary 2: Any nagging intruder, who stops by with unsought advice, will spot it immediately.
"The lesson I took away from that day is that when I’m stuck I ask an outsider."
Interesting, there's a very similar lesson in this post from James Bach's blog yesterday about programmer pairing with a tester:
http://www.satisfice.com/blog/archives/852
I think an outsider's view point is useful. Especially if they don't say, "hey, have you put the semi-colon in" alllll the time. ... though it's suprising that 1 in 20 times that is actually useful.
Especially you're a ruby dev doing a bit of PHP haha.
It also helps when the bug is ultimately not your problem, so you don't have any personal stress about getting it fixed. It's just an interesting challenge. Treating your own problems in this way can be tough, though!
One night I came in from "lunch" (2nd shift...) and saw three guys standing around a laptop trying to make something work for a customer. They had an Apache install which was supposed to log something to a given file, but they'd poke the machine with their browser and would get nothing in the log file.
I walked by on my way to my desk and one of them grabbed me. I couldn't type anything (because one of them was already in the way), but I could see what was already on the screen. It was a chunk of the config file and it looked a little like this:
I asked them to run 'ifconfig'. They did. The machine had 192.168.x.x IPs because it was sitting behind a firewall doing NAT -- a fairly common config for our customers. Apache is pretty strict about matching things when you use an explicit IP+port, so it wasn't using that virtual host to service the hit, and thus the log directive wasn't used, either.I just said, okay, change the IP on that line from 72.3.whatever to 192.168.whatever and reload the config and try it again. They did, and it worked, so I continued back to my desk.
From what I heard, they had spent over an hour trying to figure it out. I didn't type a single character and never even sat down. I chalked it up to luck and went on with life.