If it's some weird race condition crash, restarting (hopefully?) puts you in a known good state and you're unlikely to hit it again.
If it quickly repeats, you've isolated the failure to happening within a narrow scope.
This part isn't really Erlang magic, apache in pre-fork mode has a lot of the same properties. There may be some magic in supervision strategies, but I think the real magic is the amount of code you get to leave out by accepting the possibility of crashes and having concise ways to bail out on error cases.
For example, to do an mnesia write and continue if successful and crash if not, you can write
ok = mnesia:write(Record)
Similarly, when you're writing a case statement (like a switch/case in C), if you expect only certain cases, you can leave out a default case, and just crash if you get weird input.
I also find the catch Expression way of dealing with possible exceptions is often nicer than try/catch. It returns the exception so you can do something like
case catch Expression of
something_good -> ok;
{'EXIT', badarg} -> not_so_great
end
and handle the errors you care about in the same place as where you handle the successes.
Edited to add, re: failwhale, your HTTP entrypoints can usually be something like
As long as the failure in real_work_and_output is quick enough, you'll get your failwhale. Of course, if the problem is processing is too slow, you might want to set a global failwhale flag somewhere, but your ops team can hotload a patch if they need to fix the performance of the failwhale ;)
"It returns the exception so you can do something like
case catch Expression of"
Something to be aware of is the cost of a bare catch when an exception of type 'error' is thrown:
"[W]hen the exception type is 'error', the catch will build a result containing the symbolic stack trace, and this will then in the first case [1] be immediately discarded, or in the second case matched on and then possibly discarded later. Whereas if you use try/catch, you can ensure that no stack trace is constructed at all to begin with." [0]
Stack trace construction isn't free, so it makes sense to avoid it if you're not going to use it. I know that in either Erlang 17 or Erlang 18, parts of Mnesia were slightly refactored to move from bare catch to try/catch for this very reason.
If it quickly repeats, you've isolated the failure to happening within a narrow scope.
This part isn't really Erlang magic, apache in pre-fork mode has a lot of the same properties. There may be some magic in supervision strategies, but I think the real magic is the amount of code you get to leave out by accepting the possibility of crashes and having concise ways to bail out on error cases.
For example, to do an mnesia write and continue if successful and crash if not, you can write
Similarly, when you're writing a case statement (like a switch/case in C), if you expect only certain cases, you can leave out a default case, and just crash if you get weird input.I also find the catch Expression way of dealing with possible exceptions is often nicer than try/catch. It returns the exception so you can do something like
and handle the errors you care about in the same place as where you handle the successes.Edited to add, re: failwhale, your HTTP entrypoints can usually be something like
As long as the failure in real_work_and_output is quick enough, you'll get your failwhale. Of course, if the problem is processing is too slow, you might want to set a global failwhale flag somewhere, but your ops team can hotload a patch if they need to fix the performance of the failwhale ;)