I looked at the example for some time now, and I don't understand what exactly t...

capableweb · on Sept 28, 2022

I'm by no means a Python expert, but as far as I can tell, those two snippets do wildly different things. `.read(100)` would read 100 bytes from the file, `.read_text()` takes the entire file and returns a decoded string. Running the former on a 100GB file will work OK, running the latter on the same file will lead to OOM (unless you actually have +100GB free memory).

Basically, they are not equivalent.

BerislavLopac · on Sept 28, 2022

As I understand, the snippets in the suggestions are not supposed to be applied verbatim, but as generic proposals that need to be tweaked for the specific example (or ignored completely).

Note that it doesn't say to replace `.read(100)` with `.read_text()`; it uses the default version (without arguments) for both. (That being said, `read_text` doesn't have a limit argument, but that is a completely different issue.)

So you're right - if you're applying the suggestion verbatim without thinking, you will get incorrect results; but the suggestions were not intended for that.

kevincox · on Sept 28, 2022

I find that linters that have a non-trivial false-positive rate get very annoying very quick.

Chris_Newton · on Sept 29, 2022

The example shown on the Refurb GitHub page is a little unfortunate. As you say, there is also a potential scalability problem with the original design if the input file is too large to hold in memory in its entirety.

In the example, that issue could be avoided with minimal cost by reading the file lazily:

    with open(filename) as f:
        for line in f:
            do_same_things_with(line)

The specific change Refurb suggests instead doesn’t have this benefit.

However, in general switching to a lazy design won’t always be as safe and easy as it is here, so I’m not sure what I’d ideally want a tool to say in this situation. You’d need much more sophisticated analysis of the original code’s design if you wanted to do something like warning about unnecessarily reading an entire file and suggesting the lazy alternative. It doesn’t seem like a good idea to warn about all uses of the functions that will read the whole file at once, because often with Python scripts that’s a simple and effective way get the required job done and the added complexity from a more scalable design isn’t needed or justified.

Writing good style checkers is hard. :-)

uranusjr · on Sept 28, 2022

The open() variant reads the first 100 text characters, not 100 bytes, because open() works in “text mode” by default, so the two are not _that_ different. Your generally point is still very correct though, the memory issue is the main problem with the “suggestion” here.

iib · on Sept 28, 2022

Yes, I completely missed the `100` parameter to `read`. And as a sibling comment said, I also understand them to be suggestions, but in this case, I think `read()` and `read(param)` should be treated as two separate cases, or at least this caveat made clear.