I see this pattern all the time in mssql code, but hardly ever elsewhere. I've a...

gregw2 · on Jan 12, 2023

I have also seen the performance boost by using separate temp tables rather than CTEs (in my case in Redshift).

My hypothesis is that while memory use my be the same either way, you don't have the same transactional/locking requirements/overhead/consistency with multi statements as you do with a single big set of CTEs. And for analytics you rarely need that transactional isolation required by a single big CTE.

ttfkam · on Jan 12, 2023

That's because Redshift doesn't support MATERIALIZED CTEs. In mainline Postgres, a MATERIALIZED CTE acts more like a temporary table with the performance characteristics you're looking for.

zaroth · on Jan 12, 2023

Except those won’t have any indexes on them, so any joins will be horrifically slow?

ttfkam · on Jan 12, 2023

Are you putting indexes on your temporary tables? Are you including the time it takes to create them?

I can't speak for others, but most of my CTEs tend to be small enough where an index wouldn't help much. Of course toggling MATERIALIZED is an easy way to see whether the equivalent of a temp table would work best or if the planner can successfully include the body of CTE in optimization. A lot harder syntactically to swap out a subquery for a temp table, which is what the MATERIALIZED keyword essentially does under the hood (at least in Postgres).

TylerE · on Jan 12, 2023

Hell, on small tables Postgres often won’t do an index scan even when one exists because it’s faster to do a linear scan and not an index read + fetchs - or at least the planner thinks so.

drittich · on Jan 13, 2023

It's not necessarily about locking, it's a way to control what order joins get done in, which, at least in MS SQL, is up to the query engine to decide. By writing to a temp table you can override the query engine's plan, often with huge performance gains.

Less known is that you can get the same effect using a TOP N statement where N is a number larger than the rows you might get back. This presumably avoids the additional i/o penalty of the temp table creation, but I've never validated that. This trick likely is portable across db engines, too.

See https://bertwagner.com/posts/does-the-join-order-of-my-table... - Adam Machanic has some good videos diving deeper into this.

pcblues · on Jan 12, 2023

It depends on the use of the database. Where there is intense transactional writes (e.g. case management system, etc.) the CTE approach can easily get locked up, especially if the write queries do a lot of their own lookups. I have come from an MSSQL background, so maybe that is an artifact either of the DB or the way it is used in my own history :) Good question though, and I'd like to hear from someone who has experience in other DBs.

WRT subverting the transaction isolation level, using temp tables is a tacit decision to do so and best for non-vital read queries in my own experience. That is, I used it for lookups of tables that did not _need_ to be completely up to date, or when I knew the data _would_ be up to date.