How are people managing the existence of data frame APIs like pandas/polars with...

Helmut10001 · on April 3, 2023

I checked most solutions and I am sticking with plain old SQL in triple-qoutes in jupyter, e.g. [1]. I don't need auto-completion in SQL and I don't really need syntax highlighting. It's also very nice to have the combination of f-strings/variables substitution and SQL. Yet, my SQL needs are very basic.

[1]: https://kartographie.geo.tu-dresden.de/ad/wip/ephemeral_even...

__mharrison__ · on April 3, 2023

You should check out https://ponder.io .

It is created by the folks who made Mondin (a scale out version of Pandas with API compatibility as a goal). Can use dask or ray as a backend.

Ponder is the enterprise version that runs on Snowflake and BigQuery. Again, same goal, API compatibility with Pandas. You can scale out your Pandas workflow by changing the import and leaving the Pandas code.

(Full disclosure I'm an advisor.)

wswope · on April 3, 2023

As a fellow evangelist of the SQL + Pandas hybrid workflow, I’m a happy camper with Pandas’ built-in read_sql_query and to_sql.

Only big pain points are having to ship around boilerplate to construct SQLAlchemy create_engine URIs, and the performance limitations of SQLAlchemy’s inserts (if moving anything larger than a few gigs, it typically pays to ditch to_sql, and write a db-specific bulk insert process instead).