That is fine if you are working sequentially, but often tasks involve going back to the original data and doing some wrangling.
data -> model(data) -> output(model)
So if you go back to mess around with the data, your model and output could be or would be recomputed, which you would need to do eventually but not while making iterative tweaks.
Another commenter suggested adding checkboxes which is a good idea, although then you are managing a bunch of checkbox states.
> So if you go back to mess around with the data, your model and output could be or would be recomputed, which you would need to do eventually but not while making iterative tweaks.
On the other hand, not everyone remembers to re-run dependent cells. I’ve had many R notebooks handed in to me where an author didn’t check it runs top to bottom with fresh workspace.
I think the ideal user-friendly system would switch between automatic and manual recomputation depending on expected time of recomputation and expected time until the user triggers another recomputation (and clearly indicate which cells need recomputation to make them reflect the latest state of the system). If you’re editing a file path, for example, you don’t want the system to read or, worse, write that file after every key you press. Similarly, if you change one cell and within a second start editing a second one, you don’t want to start recomputation.
So, if the system thinks it takes T seconds to compute a cell, it could only start recomputation after f(T) seconds without user input.
Finding a good function f is left as an exercise for the reader. That’s where good systems will add value. A good system likely would need a more complex f, which also has ideas about how much file and network I/O the steps take and whether steps can easily be cancelled.
For the general case, I am pretty sure what you describe is the halting problem [1]. This does not mean that I believe some approximation is impossible (your “write to file” comment is particularly true). Just feeling the need to highlight that a clean, general solution is most likely not something that gets done in an afternoon.
Yeah, that “left as an exercise” was tongue-in-cheek. Even past executions do not tell you much. Change “n=10” to “n=12”, and who knows what will happen to execution times? The code might be O(n⁴), or contain an “if n=12” clause.
Looking at the world’s best best reactive system, I think it never automatically fetches external data, and only recalculates stuff it knows it can cancel, and also has a decent idea about how much time each step will take.
Working in a nonlinear manner is the whole point of Pluto. You can modify some intermediate processing in the script and none of the upstream cells, like loading the data, will run again. I also don’t need to fish through the whole damn notebook to run all the cells my change impacts. If you really, really don’t want downstream stuff to run you can either do some of the button tricks the other comments mentioned or copy (a subset) of the data. Usually I find I want to see the results of my change on everything downstream, though.
data -> model(data) -> output(model)
So if you go back to mess around with the data, your model and output could be or would be recomputed, which you would need to do eventually but not while making iterative tweaks.
Another commenter suggested adding checkboxes which is a good idea, although then you are managing a bunch of checkbox states.