Context: Ruff is a Rust-based linter for Python. It’s much faster than Python-based linters for Python because Rust is much faster than Python.
There are a couple of drawbacks to Ruff that mean it may not be right for everyone: since it’s a compiled tool, it’s harder to add custom rules — you need to fork the project to add your own, and also you need to know some Rust.
When checking the Pylint codebase it makes sense to use Ruff as well as Pylint — it’s a good sanity check that Pylint’s own tooling is not letting issues through that another tool is catching.
Yeah Ruff and Pylint can be used in a complementary way. Pylint implements rules that Ruff does not, and vice versa. I think Ruff might actually support more _total_ rules than Pylint at this point (not a great metric), but Pylint does more cross-file analysis and type inference. [1]
Some teams use Ruff alongside Pylint, some have replaced Pylint with Ruff entirely (Dagster is a good example [2]), especially those already using a type checker alongside their linter.
When you are already using flake8/ruff and mypy, there is little room left for pylint in my opinion. It might have some extra checks but they aren't enough to justify the performance hit.
I use a script that identifies modified files in my working copies, runs pyflakes on them, and after that pylint -E.
By the time pylint finishes, I'll have already fixed most of the low-hanging fruit that pyflakes found, and then I look through the pylint messages to see if there's anything additional.
How many thousands of lines of code do you need to lint before Rust's performance advantages actually matter? After all, performance optimizing a tool where N is too small to matter is the root of all evil. Keep in mind that you don't have to lint an already linted file if you cache its hash.
I honestly thought the same thing. I used flake8 and it finished in about a second on my codebase. But when you have that on a git hook, you'd be surprised how the small amount of latency affects your productivity. I have been using ruff for about two months now, and I'm still thinking "Wow that was fast" every time I create a commit.
Linting is something that people might do on every commit these days. Commits that take more than 150ms to complete are annoying. And also on every CI run. Oh well every CI run doesn’t sound bad. Except you’re also running 30 other things in order to check 30 other “should always work/be the case” invariants.
EDIT: Missed your point about caching already-linted file. In that case maybe performance doesn’t matter.
> After all, performance optimizing a tool where N is too small to matter is the root of all evil.
Picking a third-party tool that does what it does and happens to be relatively fast. Where’s the evil in that?
> Keep in mind that you don't have to lint an already linted file if you cache its hash.
yeah, if only that was the case. Even if the file hasn't changed if it's dependencies have changed you need to lint again. And doing dependency analysis on most weirdly coupled codebases out there, is not trivial nor accurate tbh.
And not running a good linter and letting a bug through is the most shit feeling you will have in your life. Trust me on that.
For linters that perform type inference, fanout means having to relint files that might not have changed. But many linters don’t need the full graph — for example, file formatting problems aren’t subject to fanout.
I also think that these rewrites are throwing the baby out with the bathwater. Especially in projects related to code quality and analysis, things like widespread adoption and issues going back almost a decade are much more important than raw performance.
For every aspect of functionality, there is a heavy baggage of quirks, bugs, edge cases etc. I am not convinced that a rewrite of a tool that is still in development and still limited to a subset of functionality is able to capture nearly a decade of community work.
The author of Ruff mentioned that many people told him exactly what you're saying. He went ahead and built it anyway. I'm puzzled though. Those naysayers were naysaying about an attempt at rewriting. You appear to be naysaying a successfully built project.
It's true that the folks who find it most helpful are the ones with large Python projects. But these are sometimes very important, widely used projects. See the testimonials from users (https://beta.ruff.rs/docs/#testimonials)
> widespread adoption
This makes little sense in the context of a linter. As long as my linter implements all the popular rules and my code is kept tidy, what's the issue? What does it matter if there's widespread adoption or not?
> still limited to a subset of functionality
You have a link to some functionality you're missing and necessary for your workflow? Or is this more of a theoretical concern?
I am happy if this tool works for you but I think writing off my comment as naysaying is not constructive.
There is no support for all syntax features of the current language specification and reference implementation. There is also no legacy support. Again, especially in projects concerned with code quality and analysis, building is not more important than long-term maintenance: Both language specification and reference implementation are constantly evolving. The current approach relies on an upstream interpreter and thereby inherits idiosyncratic divergences and limitations.
There is no extensibility. The majority of projects are neither public nor frameworks. Often projects employ a number of frameworks each with associated dialects, with both frameworks and dialects also constantly evolving. The current approach to extensibility is approximately one person manually rewriting rules individually from heterogeneous projects that span several contributors over several years. There is already outstanding maintenance work on manually introduced rules, only a subset of which gets flagged by a fraction of adopters. There is even outstanding work on feature parity.
>There are a couple of drawbacks to Ruff that mean it may not be right for everyone: since it’s a compiled tool, it’s harder to add custom rules — you need to fork the project to add your own, and also you need to know some Rust.
That might not be unsurmisable, given that most don't write their own rules (just combine, configure, enable/disable existing ones), and that in could be possible to add to Ruff a way to write rules in a DSL (without knowing Rust itself, and without the need recompile).
>But if you start going down that route, isn’t it likely that you’d end up making Ruff into a full-blown interpreter?
Not any more than with any other DSL. You can always stop at a subset enough to handle expressing linting rules. Or expressing many linting rules. Doesn't need to be able to express every possible rule -- which is the reason you'd make it a full blown interpeter. Pareto and all.
Even if that's not possible (and you need turing completness to handle custom rules), then:
>At either of which point, the motivation to use Ruff instead of e.g. Pylint might begin to disappear?
Well, Ruff would still have crazily faster parsing and built-in rules. It's not like you'd be forced to use custom interpreted rules.
And even in that case, a purpose-specific built interpeter, even if general, could be way faster (because it's optimized for a specific task and doesn't need to carry garbage as the GIL and other CPython decisions), and with better task-focused primitives, compared to embedding CPython.
You can add dynamic rules, but they either need to be compiled alongside the tool or written in a format that the tool can parse. That interchange language can be anything from plaintext to WASM bytecode.
I’ve used the former system (compiling things together in a custom build) in a Rust-based static analysis tool I’ve created. SWC uses the latter system, requiring plugins to be authored in WASM.
It's a design choice by its creator- funny enough I was listening to an episode of 'talk python to me' with the creator yesterday and he talked about exactly that.
The main reason given was that each plugin has to reimplement a bunch of the same stuff slightly differently (like finding public methods eg) and keeping it all internal avoids this and keeps things speedy
I had started writing a 1-for-1 Pylint port in Rust (almost nowhere right now), but Ruff has gotten there way faster. Props!
Pylint is rough to make faster, party from the existing structure, but honestly at one point you are just paying a huge cost for traversing wide ASTs and doing all this work, and then paying extra for the Python object model.
In my profiling of Pylint, I've found some fixes, and I am curious about what something like slots or mypyc could do to it... but I'm also very partial to just having a system language implementation.
People talk about ease of contributing as a big advantage for these kinds of tools to be written in the language they are working on. Personally I've found that people who like messing with tools like the idea of learning languages to contribute.
My one kind of reservation about Ruff is that the code base is designed as "run each checker as its own function". My suspicion is that this leads to a performance ceilings as each checker is traversing the AST looking for its flavor of thing. Visitor patterns (theoretically!) allow for us to run checkers in parallel nicely. Granted the Ruff pattern allows for parallel work as well (and you can share all of the AST data between threads anyways), so maybe its "right". It definitely looks easier to maintain!
It is already happening and with the availability of the tools like PyO3 [1] (it allows Python developers to write performance-critical code in Rust and then expose it as a Python module) the Rustication of the Python will be progressing while some controversies will appear [2]. Adding new language adds complexity to the python code base but it is always good to have at least choice and decide case-by-case if use Rust in Python app or not.
too bad it doesn't appear to support flake8 plugins and only has a small subset hardcoded into it. if it could support arbitrary flake8 plugins I would move SQLAlchemy to this tool immediately.
the main reason people want to move off flake8 is not so much a speed issue but that its author refuses to allow it to use pyproject.toml files for configuration, meaning every project's configuration has to remain half-broken using obsolete setup.cfg files. A very silly situation.
Yeah we don't support arbitrary Flake8 plugins. We do plan to support plugins eventually but the exact architecture and API are TBD.
If there are plugins that would be particularly impactful for you, I'd be happy to take a look at adding them. (We support some of the plugins you use in SQLAlchemy, like flake8-docstrings and flake8-builtins, but not flake8-rst-docstrings which I'm guessing is a blocker.)
thanks for the reply! we use a bunch of goofy ones that nobody else does. one of them is like a 3 line plugin that has only a handful of stars on github.
That might be equivalent to using Ruff's isort implementation with `force_single_line` but maybe that comes with other implications that you don't want for SQLAlchemy.
I really tried to use isort but it doesnt do the import order style that we use, and there were some other things that didnt work for us. flake8-import-order has selectable styles that I match in my own "zimports" tool that we use to actually write our imports. flake8-import-single then adds this extra idea that you shouldn't have "from tool import x, y, z", since that defeats the purpose import sorting which is to avoid merge artifacts.
Are smaller plugins like this potentially an easy place for first time contributors to work in?
My Rust skills are certainly very lacking but if there is some big list of arbitrary flake8 plugins that there is a desire to have ported, that sounds like it could be a relatively easy place to get one’s feet wet.
I maintain a few open source libraries and we have recently switched to Ruff from Flake8 exclusively because of the lack of pyproject.toml support.
These are relatively speaking small codebases, and the speed improvements really don’t make a difference for us, but with trying to get our own projects in line with current standards flake8’s refusal to support it was the final nail in the coffin.
I think the point was reasonable when pyproject.toml first came about and parsing it was a bit more of a Wild West, I can certainly understand the unwillingness to try and support it then. Now with Python itself having toml support in the standard library, and every other major tool standardizing on it, not to mention TOML being the de-facto config file for Rust projects. It just seems like a bizarre hill to die on to me now.
It’s not as if the work hasn’t been done, there are forks of flake8 with pyproject.toml support, but as far I’m aware multiple PRs with support for it have been denied.
it's utterly ridiculous and has all the indications of a maintainer who was grumpy about making a change (a place I know *very well*, I am there multiple times a week myself), and now as it's gone on for years, they no longer can really back down from their original "complaints" even though they are absurd at this point.
The good news is some maintainers can have a change of heart. I seem to remember there being a certain someone who thought asyncio python was pointless, but now SQLAlchemy has asyncio support and that's super cool!
I've never understood why HN has never included even a 80 character description. I think it would really help even if it was only shown on the comments tab.
> You can't. This is to prevent people from submitting a link with their comments in a privileged position at the top of the page. If you want to submit a link with comments, just submit it, then add a regular comment.
If you want to add more context, add it in a comment. If it’s good people will upvote it.
One reason I could think of is because it encourages informative titles - I enjoy being able to click directly from home page to article, without having to look at the comments. A lower percentage of posters would explain Ruff in the title if there was a description. (Clearly it's currently not 100% either, that's why I'm here.)
Though this is something that keeps coming up on Reddit as well, I assume there's a reason why HN went with this design.
There's a very clear graph the first thing on ruff's GitHub page, easy to find and research yourself: https://github.com/charliermarsh/ruff Can't miss it if you take time to check out the project.
Your comment seems skeptical. Practically, It’s a night and day difference if you’ve integrated a linter with your editor. Plus, ruff is able to fix as well as lint.
Didn't mean it that way, something being 100+ times slower is a LOT. And I care a lot about tool speed. I just didn't really think I needed to be shamed for asking a question they had a ready answer for.
I was excited to start using Ruff however it has quite a few ruff edges for me to use professionally. Back to pylint for now, but looking forward to adopting it once it's ready. The difference in speed was night and day.
Another feature I hope to see in Ruff is to "score" the code. With pylint, I can have a CI rule to make the lint check pass if it scores 8/10 or above, or even a lower score so that co-workers who resent using linters can start using them.
I'm glad to see the organic combination of Python and Rust. My personal interesting experience is to rewrite a Python project using Rust [1], which has increased the speed by 40 times. Then the original Python project and the rewritten Rust code are organically combined through PyO3, which is very good. We have not only achieved performance, but also achieved ease of use and scalability. Just like PyLint and Ruff.
I clicked through, did a search for Ruff, then looked up the Github project. Okay, but… so what? Is this another case of "I rebuilt X in Rust and hit the front page of HN"?
It’s a big deal because normally linters have been written in pure python and they are prohibitively slow for large code bases. This tool provides an important speedup that will save my team hour’s of waiting for CI.
I am curious how big your codebase is that it saves hours. I never found the existing Python based tooling to be slow enough that I felt the need to consider alternatives. Though, I will gladly take the speedup.
Any program that took longer than several minutes to "lint" for me seemed to blow up anyway, usually by running out of memory or hitting some infinite recursion.
Our codebase is somewhere between 200k and 300k LOC. Running the flake8 suite takes minutes. We haven't integrated ruff, but during a test run it ran in less than a second, IIRC.
So, with, say 20-30 commits per day, that's about an hour, every day. Sure, it's not like we're usually sitting there twiddling our thumbs during builds, but sometimes we actually do have to wait for a build to get ready. Just the waiting time during those rare occasions add up.
[edit]: flake8, remarkably, dropped roughly an order of magnitude in performance between version 2 and 3. So that didn't help.
> [edit]: flake8, remarkably, dropped roughly an order of magnitude in performance between version 2 and 3. So that didn't help.
Sorry for the drive by, but did you try bisecting the flake8 change that slowed it down? Unless there were thousands of commits between releases, it would only take <10 runs to bisect, an automated bisect should be under an hour with a test case that takes a few minutes. Could be an interesting thing to try?
Doubt it would be helpful. It appears that the model changed dramatically between those two versions. For instance, the whole API (which we relied on) was dropped[0] and some legacy API was glued in place.
In either case, the information wouldn't be all that useful. We run our stuff off of Debian packages (yes, we're old school), and we're not particularly interested in maintaining our own fork.
I get that bit, what I don't get (or don't see a good reason for) is why Ruff isn't the link and this is. Link to the actual project if that's the subject. If someone wants to compare performance, write a blog post or something instead of wasting everyone's time pointing to references in a different project.
Is ruff a good replacement for pylint yet ? It replaces flake8 well but since I use black and pylint, I don't need flake8. However, pylint does a lot so if it does replace it, even without plugins, that would be amazing.
This honestly depends on whether you are familiar enough with the Ruff codebase and Rust syntax, no different than just "doing it by hand".
At this point (IMHO) ChatGPT does not make a meaningful difference in the production of software, aside from individual workflow preferences. The reason being ChatGPT will produce a lot of computer code which is very likely to be subtly incorrect in difficult to spot places.
There are a couple of drawbacks to Ruff that mean it may not be right for everyone: since it’s a compiled tool, it’s harder to add custom rules — you need to fork the project to add your own, and also you need to know some Rust.
When checking the Pylint codebase it makes sense to use Ruff as well as Pylint — it’s a good sanity check that Pylint’s own tooling is not letting issues through that another tool is catching.