Hacker News new | past | comments | ask | show | jobs | submit login

Polars can use lazy processing, where it collects all of the operations together and creates a graph of what needs to happen, while pandas executes everything upon calling of the code.

Spark tended to do this and it makes complete sense for distributed setups, but apparently is still faster locally.




Laziness in this context has huge advantages in reducing memory allocation. Many operations can be fused together, so there's less of a need to allocate huge intermediate data structures at every step.


yeah, totally, I can see that. I think that polars is the first library to do this locally, which is surprising if it has so many advantages.


It's been around in R-land for a while with dplyr and its variety of backends (including Arrow, the same as Polars). Pandas is just an incredibly mediocre library in nearly all respects.


> It's been around in R-land for a while with dplyr and its variety of backends

Only for SQL databases, so not really. Source: have been running dplyr since 2011.


The Arrow backend does allow for lazy eval.

https://arrow.apache.org/cookbook/r/manipulating-data---tabl...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: