In my experience, Polars streaming runs out of memory at much smaller scales than both DuckDB and DataFusion and tends to use much more memory for the same workload when it doesn't outright segfault.
Polars is faster than those two once you get to less than a few GB, but beyond that you're better off with DuckDB or DataFusion.
I would love for this to improve in Polars, and I'm sure it will!
Do you mean segfault or OOM? I am not aware of Polars segfaulting on high memory pressure.
If it does segfault, would you mind opening an issue?
Some context; Polars is building a new streaming engine that will eventually be ready to run the whole Polars API (Also the hard stuff) in a streaming fashion. We expect the initial release end of this year/early next year.
Our in-memory engine isn't designed for out-of-core processing and thus if you benchmark it on restricted RAM, it will perform poorly as data is swapped or you go OOM. If you have a machine with enough RAM, Polars is very competitive in performance. And in our experience it is tough to beat in time-series/window functions.
Segmentation violations are often the result of different underlying problems, one of which can be running out of memory.
We (the Ibis team) have opened related issues and the usual response is to not use streaming until it's ready, or to fix the problem if it can be fixed.
Not sure what else there is to do, seems like things are working as expected/intended for the moment!
We'll definitely be the first to try out any improvements to the streaming engine.