Spot instances are pretty painful for training. It's annoying to have the machin...

viksit · on Oct 11, 2016

^ That. For all that people say about spot instances, there's no infrastructure I know if to manage jobs and have them migrate to higher priced instances without losing state.

RBerenguel · on Oct 11, 2016

You can always snapshot and keep track of state as you go (a little bit tricky with Spark, though). We use spot instances for training we know is not vital (as in, has to be done, but rather run it twice and save money anyway that run it for sure). Also, once you know what availability specific instances have you can always choose better (i.e. maybe c3.xlarge is slightly more expensive as spot than large, you can do with large... but xlarge has almost no shutdowns)