Person 1 adds code that uses this with Pandas. Person 2 sees the csvbase:// URIs...

koliber · on April 10, 2024

Action at a distance, and across time.

I just read this article and have very conflicting feelings about it. It is clever, and the nice kind of clever that does not require one to be a mega-brain. On the other hand, it creates invisible and uncontrollable dependencies, such as the one you describe.

Another drawback: something is broken and I want to set a debug point in the code that fetches the CSV data. Unless you know about fsspec it will be hard to follow the breadcrumbs to know how this library injects itself into your code.

mekoka · on April 10, 2024

The problem can be even more immediate. Person 2 (a junior or intern) is given the normally mundane task to go through the code base and replace pandas with foozle, a faster, but less mature alternative. They try to replace the pd.read_csv() calls with foozle's equivalent, say fz.read_csv(). So, pd.read_csv("csvbase://") becomes fz.read_csv("csvbase://"). Only, Foozle hasn't implemented fsspec and person 2 doesn't know anything about it either. Fun times.

pantsforbirds · on April 10, 2024

How is this different from the risk that a dataframe library will stop supporting S3:// URIs? Why wouldn't Foozle's release notes mention a major protocol change? How is this scenario different from any other library making a major change without documenting it?

reubenmorais · on April 10, 2024

Because fsspec handles the extensions directly, Foozle does not have to opt in to get the csvbase:// support, all that's required is for the csvbase Python client to be installed. So users of Foozle might start depending on these extensions without Foozle ever finding out about it.

But I guess that's the reality of making libraries, even bugs will be relied on, we really don't have any methods for evolving software ecosystems reliably and compatibly.

krisoft · on April 10, 2024

Not sure what should be my takeaway here. It feels one can say the same story with any third party dependency. Are you saying one should not use dependencies at all?

lolc · on April 10, 2024

In this hypothetical case, there would be some floundering to discover the cause, and a two-line fix. Rather than having the two extra lines from the start. Seems acceptable risk to me.