Too bad polars like everything else written in rust has horrible ergonomics. I d...

wenc · on April 3, 2023

Although Polars is written in Rust, most people will use Polars from Python since that what most data wranglers use. And Polars’ Python interface is excellent.

detrites · on April 3, 2023

Do you happen to know if this can be used as more or less a drop-in replace for pandas in Python? (Presuming, I suppose data conversion prior.)

wenc · on April 3, 2023

It depends on what you do with Pandas. The semantics are different so you’ll have to rewrite stuff — that said, for me it was worth it for the most part because I work with massive columnar data in Parquet.

So no, not a drop in replacement. But not a difficult transition either.

This page explains how Polars differs from Pandas.

https://pola-rs.github.io/polars-book/user-guide/coming_from...

valarauko · on April 3, 2023

As someone who loathes the Pandas syntax and lusts for the relatively cleaner tidyverse code of my colleagues, the Polar syntax just feels ... off (from the link):

df.with_columns( pl.when(pl.col("c") == 2) .then(pl.col("b")) .otherwise(pl.col("a")).alias("a") )

Seeing the multiple nested pl calls within a single expression just feels odd to me. It's definitely reminiscent of Dplyr but in a much less elegant way.

wenc · on April 3, 2023

You can use DuckDB (SQL syntax) then just convert to Polars (instantaneous).

The method chaining syntax is unwieldy in any language.

Magrittr + dplyr (tidyverse) pipeline syntax is beautiful syntax but there’s a lot of magic with NSE (nonstandard evaluation) that makes it really tricky when you need to pass a variable column name.

I’ve sort of converged on SQL as the best compromise.

valarauko · on April 3, 2023

I like the idea of Polars, and it's on my list of things to try. Unfortunately, my current codebase in heavily tied to Pandas (bioinformatics tools that use Pandas as a dependency). I also deal mostly in sparse matrices, which I'm unsure if duckdb handles.

Looking at the Polars documentation also makes me nervous, due to how much of my current Pandas-fu relies on indexing to work. I appreciate that indexes can be NSE but it's how a lot of the current tools in my field work (python and in R) with important data in the index, eg, genes or cellular barcodes, and relying on the index to merge datasets.

Another caveat for me is at multiple times in my workflow I drop into R for plots and rpy2 can convert pandas dataframes into equivalent R atomics. With Polars it would be just an additional step of converting to pandas df but just something I need to consider. That said, I've disliked the Pandas syntax for so long that the mental overhead might be well worth it.

dunefox · on April 3, 2023

Yeah, I tried to get into it but it seems very verbose.

detrites · on April 3, 2023

Thank you, that is definitely the page for me.

aguspdana · on April 3, 2023

I agree that the Rust API is still rough. But..

Polars has Python API. It's much nicer than the Rust API. Plus the documentation is more complete.

Laiho · on April 3, 2023

Couldn't agree more about the rust API. Python API seems OK. Found it very complicated to use and just clunky in general.