My question for Haskellers is how to do updates of values on a large scale, let's say in a simulation.
In imperative languages, the program will have a list of entities, and there will be an update() function for each entity that updates its state (position, etc) inline, i.e. new values are overwriten onto old values in memory, invoked at each simulation step.
In Haskell, how is that handled? do I have to recreate the list of entities with their changes at every simulation step? does Haskell have a special construct that allows for values to be overwritten, just like in imperative languages?
Please don't respond with 'use the IO monad' or 'better use another language because Haskell is not up for the task'. I want an actual answer. I've asked this question in the past in this and some other forums and never got a straight answer.
If you reply with 'use the IO monad' or something similar, can you please say if whatever you propose allows for in place update of values? It's important to know, for performance reasons. I wouldn't want to start simulations in a language that requires me to reconstruct every object at every simulation step.
I am asking for this because the answer to 'why Haskell' has always been for me 'why not Haskell: because I write simulations and performance is of concern to me'.
I'm not sure why you say not to respond with 'use the IO monad' because that's exactly how you'd do it! As an example, here's some code that updates elements of a vector.
import Data.Vector.Unboxed.Mutable
import Data.Foldable (for_)
import Prelude hiding (foldr, read, replicate)
-- ghci> main
-- [0,0,0,0,0,0,0,0,0,0]
-- [0,5,10,15,20,25,30,35,40,45]
main = do
v <- replicate 10 0
printVector v
for_ [1 .. 5] $ \_ -> do
for_ [0 .. 9] $ \i -> do
v_i <- read v i
write v i (v_i + i)
printVector v
printVector :: (Show a, Unbox a) => MVector RealWorld a -> IO ()
printVector v = do
list <- foldr (:) [] v
print list
It does roughly the same as this Python:
# python /tmp/test28.py
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# [0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
def main():
v = [0] * 10
print(v)
for _ in range(5):
for i in range(10):
v_i = v[i]
v[i] = v_i + i
print(v)
if __name__ == '__main__': main()
I have a rather niche theory that many Hindley-Milner type inference tutorials written by Haskellers insist on teaching the error-prone, slow, details of algorithm W because otherwise the authors would need to commit to a way to do destructive unification (as implied by algorithm J) that doesn't attract pedantic criticism from other Haskellers.
For me, I stopped trying to learn Haskell because I couldn't quite make the jump from writing trivial (but neat) little self-contained programs to writing larger, more involved, programs. You seem to need to buy into a contorted way of mentally modelling the problem domain that doesn't quite pay off in the ways advertised to you by Haskell's proponents (as arguments against contrary approaches tend to be hyperbolic). I'm all for persistent data structures, avoiding global state, monadic style, etc. but I find that OCaml is a simpler, pragmatic, vehicle for these ideas without being forced to bend over backwards at every hurdle for limited benefit.
> In imperative languages, the program will have a list of entities, and there will be an update() function for each entity that updates its state (position, etc) inline, i.e. new values are overwriten onto old values in memory, invoked at each simulation step.
> In Haskell, how is that handled? do I have to recreate the list of entities with their changes at every simulation step? does Haskell have a special construct that allows for values to be overwritten, just like in imperative languages?
You don't _have to_ recreate the list each time, but that's probably where I'd suggest starting. GHC is optimized for these kinds of patterns, and in many cases it'll compile your code to something that does in-place updates for you, while letting you write pure functions that return a new list. Even when it can't, the runtime is designed for these kinds of small allocations and updates, and the performance is much better than what you'd get with that kind of code in another language.
If you decided that you really did need in-place updates, then there are a few options. Instead of storing a vector of values (if you are thinking about performance you probably want vectors instead of lists), you can store a vector of references that can be updated. IO is one way to do that (with IORefs) but you can also get "internal mutability" using STRefs. ST is great because it lets you write a function that uses mutable memory but still looks like a pure function to the callers because it guarantees that the impure stuff is only visible inside of the pure function. If you need concurrency, you might use STM and store them as MVars. Ultimately all of these options are different variations on "Store a list of pointers, rather than a list of values".
There are various other optimizations you could do too. For example, you can use unboxed mutable vectors to avoid having to do a bunch of pointer chasing. You can use GHC primitives to eek out even better performance. In the best case scenario I've seen programs like this written in Haskell be competitive with Java (after the warmup period), and you can keep the memory utilization pretty low. You probably won't get something that's competitive with C unless you are writing extremely optimized code, and at that point most of the time I'd suggest just writing the critical bits in C and using the FFI to link that into your program.
You... don't. You have to rely on compiler optimizations to get good performance.
Monads are more-or-less syntax sugar. They give you a structure that allows these optimizations more easily, and also make the code more readable sometimes.
But in your example, update returns a new copy of the state, and you map it over a list for each step. The compiler tries to optimize that into in-place mutation.
IMO, having to rely so much on optimization is one of the weak points of the language.
You do, and you'll have to use do destructive updates within either ST or IO monad using their respective single variable or array types. It looks roundabouty, but does do the thing you want and it is fast.
ST and IO are "libraries" though, in the sense that they not special parts of the language, but appear like any other types.
I mean, Haskell has mutable vectors[1]. You can mutate them in place either in the IO monad or in the ST monad. They fundamentally work the same way as mutable data structures in any other garbage collected language.
When I worked on a relatively simple simulation in Haskell, that's exactly what I did: the individual entities were immutable, but the state of the system was stored in a mutable vector and updated in place. The actual "loop" of the simulation was a stream[2] of events, which is what managed the actual IO effect.
My favorite aspect of designing the system in Haskell was that I could separate out the core logic of the simulation which could mutate the state on each event from observers which could only read the state on events. This separation between logic and pure metrics made the code much easier to maintain, especially since most of the business needs and complexity ended up being in the metrics rather than the core simulation dynamics. (Not to say that this would always be the case, that's just what happened for this specific supply chain domain.)
Looking back, if I were going to write a more complex performance-sensitive simulation, I'd probably end up with state stored in a bunch of different mutable arrays, which sounds a lot like an ECS. Doing that with base Haskell would be really awkward, but luckily Haskell is expressive enough that you can build a legitimately nice interface on top of the low-level mutable code. I haven't used it but I imagine that's exactly what apces[3] does and that's where I'd start if I were writing a similar sort of simulation today, but, who knows, sometimes it's straight-up faster to write your own abstractions instead...
apecs is really nice! it's not without its issues, but it really is a sweet library. and some of its issues are arguably just issues with ECS than apecs itself.
World simulation(Stream<Event> events, World world) =>
events.IsComplete
? world
: simulation(applyEventToWorld(events.Head, world), events.Tail);
World applyEventToWorld(Event event, World world) =>
// .. create a new World using the immutable inputs
That takes the first event that arrives, transforms the World, then recursively calls itself with the remaining events and the transformed World. This is the most pure way of doing what you ask. Recursion is the best way to 'mutate', without using mutable structures.
However, there are real mutation constructs, like IORef [1] It will do actual in-place (atomic) mutation if you really want in-place updates. It requires the IO monad.
> does Haskell have a special construct that allows for values to be overwritten
Yes and no.
No, the language doesn't have a special construct.
Yes, there are all kinds of mutable values for different usage patterns and restrictions.
Most likely you end up with mutable containers with some space reserved for entity state.
You can start with putting `IORef EntityState` as a field and let the `update` write there. Or multiple fields for state sub-parts that mutate at different rates.
The next step is putting all entity state into big blobs of data and let entities keep an index to their stuff inside that big blob.
If your entities are a mishmash of data, then there's `apecs`, ECS library that will do it in AoS way. It even can do concurrent updates in STM if you need that.
Going further, there's `massiv` library with integrated task supervisor and `repa`/`accelerate` that can produce even faster kernels.
Finally, you can have your happy Haskell glue code and offload all the difficult work to GPU with `vulkan` compute.
> My question for Haskellers is how to do updates of values on a large scale, let's say in a simulation.
The same way games do it. The whole world, one frame at a time. If you are simulating objects affected by gravity, you do not recalculate the position of each item in-place before moving onto the next item. You figure out all the new accelerations, velocities and positions, and then apply them all.
I don't understand why you hate the IO monad so much. I mean I've seen very large codebases doing web apps and almost everything is inside the IO monad. It's not as "clean" and not following best practices, but still gets the job done and is convenient. Having pervasive access to IO is just the norm in all other languages so it's not even a drawback.
But let's put that aside. You can instead use the ST monad (not to be confused with the State monad) and get the same performance benefit of in-place update of values.
Well what kind of values and how many updates? You might have to call an external library to get decent performance, like you would use NumPy in Python. This might be of interest: https://www.acceleratehs.org/
"Pretty fast".. relatively speaking, considering that it's in an immutable, garbage collected language. Still woefully slow compared to anything else out there(say, bevy? which incidentally works similarly to apecs) and mostly practically unusable if the goal is to actually create a real product.
You can create a real product with apecs lol. It is not going to block an indie game written in Haskell with it, for instance. And you could totally use it to write simulations for stuff too.
Also from the apecs README:
> Fast - Performance is competitive with Rust ECS libraries (see benchmark results below)
Sounds like that "woefully slow" judgment of yours wasn't based in any real experience but rather just your opinion?
In imperative languages, the program will have a list of entities, and there will be an update() function for each entity that updates its state (position, etc) inline, i.e. new values are overwriten onto old values in memory, invoked at each simulation step.
In Haskell, how is that handled? do I have to recreate the list of entities with their changes at every simulation step? does Haskell have a special construct that allows for values to be overwritten, just like in imperative languages?
Please don't respond with 'use the IO monad' or 'better use another language because Haskell is not up for the task'. I want an actual answer. I've asked this question in the past in this and some other forums and never got a straight answer.
If you reply with 'use the IO monad' or something similar, can you please say if whatever you propose allows for in place update of values? It's important to know, for performance reasons. I wouldn't want to start simulations in a language that requires me to reconstruct every object at every simulation step.
I am asking for this because the answer to 'why Haskell' has always been for me 'why not Haskell: because I write simulations and performance is of concern to me'.