How not to blow up the production database

tnorgaard · on Oct 15, 2021

Answer: Materalized Views.

On a unrelated note: Still hoping for those automatically refreshed materalized views in PostgreSQL, ala what VoltDB has.

kccqzy · on Oct 15, 2021

I've been looking at Materielize for a while (https://materialize.com/). It can handle automatically refreshed materialized views. Last time I checked, it didn't support some Postgres SQL constructs that I use often, but I'm really looking forward to it.

ericHosick · on Oct 15, 2021

> Still hoping for those automatically refreshed materialized views in PostgreSQL, ala what VoltDB has.

Not exactly what you're hoping for and you probably already follow this pattern. pg_cron can help (and is now available in AWS RDS).

```sql CREATE EXTENSION IF NOT EXISTS pg_cron;

CREATE MATERIALIZED VIEW IF NOT EXISTS activeschema.some_thing_cached AS ...;

SELECT cron.schedule('some_thing_cached', '/5 * * *', $CRON$ REFRESH MATERIALIZED VIEW some_thing_cached; $CRON$ ); ```

tibiapejagala · on Oct 15, 2021

I think that the problem is when you have a materialized view which takes hours to refresh. We are lucky that 99% of our traffic is during 7-19 on weekdays, so we can just refresh at night, but that won't work for others.

I don't know much about how postgresql works internally, so I just probably don't understand the constraints. Anyway as I understand, there are two ways to refresh. You either refresh a view concurrently or not.

If not, then postgres rebuilds the view from its definition on the side and at the end some internal structures are switched from the old to the new query result. Seems reasonable, but for some reason, which I don't understand due to my limited knowledge, an exclusive access lock is held for the entire duration of the refresh and all read queries are blocked, what doesn't work for us.

If you refresh concurrently, postgres rebuilds the view from its definition and compares the old and the new query result with a full outer join to compute a diff. The diff is then applied to the old data (like regular table INSERT/UPDATE/DELETE I assume), so I think you get away with just an exclusive lock and read access still works. There are two downsides to this, first that it requires a UNIQUE constraint for the join, second that the full outer join is a lot of additional work.

I never had the time to test Materialize, but it seems to do what I want with its continuous refresh.

I also thought about splitting the materialized view into two, one for rarely changing data and another one for smaller part of the data which changes daily. Then I would only have to refresh the smaller view and UNION ALL both materialized views in a regular view. Not sure how well will that work with postgres query planner.

beckingz · on Oct 16, 2021

Not sure about how that would work with the PG query planner either, but a batch for rarely changing data and rapid changing data is basically the Lambda data architecture, so probably a good call!

nkozyra · on Oct 15, 2021

If it's a one shot data compilation, you could use something like postgres' NOTIFY to trigger a listening external app.

anticodon · on Oct 16, 2021

There's one gotcha with this approach: if there's another DDL operation running simultaneously with REFRESH MATERIALIZED VIEW, you'd get an internal postgres error.

You cannot be sure that refresh won't coincide with a grant on all tables in the schema, for example.

marcosdumay · on Oct 15, 2021

Given how well they work on any non-specialised DBMS, I prefer Postgres to take their time and do it right (AKA, differently from everybody else).

sojournerc · on Oct 15, 2021

TimescaleDB (psql extension) has these, specific to time-series however.

https://docs.timescale.com/timescaledb/latest/how-to-guides/...

Pxtl · on Oct 16, 2021

Mssql has "indexed views" which are automatically updated instantly... But they destroy your insert/update performance and their requirements are so draconian as to be completely impossible to ever actually use (no left joins, no subqueries, no self joins, etc...).

BikiniPrince · on Oct 15, 2021

Yes, views are nice, but there is also a fair concept of not needlessly bogging down a table. Sure, they were making up data, but a flat table with stats, profile data and other easily external data is just bloat. Once you have an id then static fields can be retrieved from other services/data stores.

tomrod · on Oct 15, 2021

I'm not sure I am following. Aren't materialized views just formal, cached results of a query? That wouldn't bog down a table.

lazide · on Oct 15, 2021

I think their point is more ‘don’t store all that junk in your primary database and then do all your work on it there too if you can just stuff it somewhere else’. Which has pros and cons and depends a lot on various scaling factors.

beckingz · on Oct 16, 2021

Materialized views are persistent tables that are typically updated when the underlying data is updated.

Typically.

Pxtl · on Oct 16, 2021

I'm pretty sure most engines use the term "materialized views" for eventual consistency tables. The only db I've seen with that kind of ACID materialized view is MS SQL, which calls them "indexed views".

manquer · on Oct 15, 2021

Perhaps he means it will bog down on refresh.

tomrod · on Oct 15, 2021

Maybe? Not sure.

aledalgrande · on Oct 15, 2021

Another thing I'm waiting for in Postgres is lifting and decoupling from the connection limit...

sroussey · on Oct 16, 2021

It’s a good thing to use 2021 hardware. In 2001 you had to be more creative, as to have summary tables as materialized views would blow up the production database, lol.

heisenbit · on Oct 16, 2021

> At Battlefy our primary database is MongoDB.

I somehow feel their optimization problem is related to not using a relational DB.

DeathArrow · on Oct 16, 2021

I am not convinced that the best data store for a game is a general purpose database.