In my experience a huge portion of the userbase who loves CTEs are analysts or devs running queries on decently large datasets, and they mostly like them because of readability and don't understand performance possibilities.
I appreciate that tools like dbt allow materialization options as both CTEs and views/tables/etc because being able to pivot between them is super nice.
Note that there are basically no special performance considerations when using CTEs in recent versions of postgres… unless your CTE is recursive or does some weird side-effect (which is unlikely).
This is incorrect. If you use a CTE more than once in a query, Postgres will materialize that data into a temp table that has no indexes. If that CTE gets joined a lot, it will incur many O(N) searched through unindexed data.
I've seen cases where a few GBs of product data gets pulled into a CTE this way. Queries take 10s to complete from processing GBs of unindexed data
Wow, this is amazing. Just the other day I was staring at a query with multiple CTEs that took 20+ seconds to run. Ended up adding an additional WHERE clause on an indexed column to the slowest CTE, which took the execution time down to 100s of ms.
Just tried the original query, but with explicitly not materialized CTEs and the runtime was basically as fast as the manually pre-filtered solution.
No, but this would make an incredible Postgres extension.
(I’d also love one that lets Postgres use a covering index for primary storage. This can reduce data size and load less data off disk due to locality. MySQL has this by default and SQLite has it via WITHOUT ROWID tables.)
Strange take you have there in assuming they don't understand performance possibilities. Many of the problems that analysts and such work on aren't performance bound because focusing on the performance would be a net waste of time. "Don't optimize early" and all that. For example, I had to share SQL recently where the absolute most important thing, beyond all else was to make it readable. CTEs fit the role.
Postgres has materialization options for CTEs. You can ask it to rewrite the CTE as a join via "WITH foobar AS NOT MATERIALIZED (...)". Alternatively you can force materialization by writing "WITH foobar AS MATERIALIZED (...)"
Based on other comments, this has been done for psql, or at least to some extent. So that's the other thing... pre-emptively writing code in a non-intuitive way for performance reasons often leads to practices sticking around long past the rationale for them even being true.
I appreciate that tools like dbt allow materialization options as both CTEs and views/tables/etc because being able to pivot between them is super nice.