I see this pattern all the time in mssql code, but hardly ever elsewhere. I've always wondered if there is a reason for that. Are you coming from that background? I'm curious if anyone knows why it is so favored there?
I'm not sure I agree re performance btw. I've fixed a lot of slow mssql queries by changing them in the other direction (EDIT: or just adding a join). I do see how temp tables might shorten the time you hold locks---but then you might be subverting your transaction isolation level.
I have also seen the performance boost by using separate temp tables rather than CTEs (in my case in Redshift).
My hypothesis is that while memory use my be the same either way, you don't have the same transactional/locking requirements/overhead/consistency with multi statements as you do with a single big set of CTEs. And for analytics you rarely need that transactional isolation required by a single big CTE.
That's because Redshift doesn't support MATERIALIZED CTEs. In mainline Postgres, a MATERIALIZED CTE acts more like a temporary table with the performance characteristics you're looking for.
Are you putting indexes on your temporary tables? Are you including the time it takes to create them?
I can't speak for others, but most of my CTEs tend to be small enough where an index wouldn't help much. Of course toggling MATERIALIZED is an easy way to see whether the equivalent of a temp table would work best or if the planner can successfully include the body of CTE in optimization. A lot harder syntactically to swap out a subquery for a temp table, which is what the MATERIALIZED keyword essentially does under the hood (at least in Postgres).
Hell, on small tables Postgres often won’t do an index scan even when one exists because it’s faster to do a linear scan and not an index read + fetchs - or at least the planner thinks so.
It's not necessarily about locking, it's a way to control what order joins get done in, which, at least in MS SQL, is up to the query engine to decide. By writing to a temp table you can override the query engine's plan, often with huge performance gains.
Less known is that you can get the same effect using a TOP N statement where N is a number larger than the rows you might get back. This presumably avoids the additional i/o penalty of the temp table creation, but I've never validated that. This trick likely is portable across db engines, too.
It depends on the use of the database. Where there is intense transactional writes (e.g. case management system, etc.) the CTE approach can easily get locked up, especially if the write queries do a lot of their own lookups. I have come from an MSSQL background, so maybe that is an artifact either of the DB or the way it is used in my own history :) Good question though, and I'd like to hear from someone who has experience in other DBs.
WRT subverting the transaction isolation level, using temp tables is a tacit decision to do so and best for non-vital read queries in my own experience. That is, I used it for lookups of tables that did not _need_ to be completely up to date, or when I knew the data _would_ be up to date.
I'm not sure I agree re performance btw. I've fixed a lot of slow mssql queries by changing them in the other direction (EDIT: or just adding a join). I do see how temp tables might shorten the time you hold locks---but then you might be subverting your transaction isolation level.