Major features of PostgreSQL 9.6 [pdf]

jhoechtl · on Sept 26, 2016

I really would like to see bi-temporal table support eventually come to PG

https://wiki.postgresql.org/images/6/64/Fosdem20150130Postgr...

Is anybody aware somebody working on that for PG?

clg2 · on Sept 26, 2016

> I really would like to see bi-temporal table support eventually come to PG

> Is anybody aware somebody working on that for PG?

Not that I am aware of.

Postgres is a fantastic database, and with zero licensing headaches, but when people ask why companies are still paying for commercial DBMS engines, some features that they have which Postgres does not have, and which I often miss are :

1)"temporal" / versioned tables / time consistent views of data . Oracle has a lot of "flashback" related options at the DB and table levels which can be really useful.

2) Easy, built-in partitioning for large datasets. (This is being worked on and will probably start to be released in Postgres 10).

3) Better backup options for large databases, such as incremental/differential backups.

In the Yandex migration from Oracle to Postgres I recall that this, ie incremental backups, is one of the things they mentioned it would be great to have.

4) Along with backups, the built-in recovery options in Postgres are fairly basic compared to what you can do regarding single table recovery, corruption repair etc that other DBs have.

dgregd · on Sept 26, 2016

> why companies are still paying for commercial DBMS engines

In my experience in 90% of cases application code migration to a new db engine is the real road block.

I've seem many cases where even migration to a new Oracle version is a huge problem because of required app code review and full user acceptance tests. So companies pay licence fees for old unsupported versions. What is even more absurd.

And frankly speaking in many cases of simple apps even SQLite has all necessary features in replace Oracle.

tajen · on Sept 26, 2016

> 1)"temporal" / versioned tables / time consistent views of data

Oracle doesn't have transactional DDL. It means if you drop a column or create a table, it commits your transaction automatically.

This is huge, because with PostgreSQL you can import a full database in a single transaction, and roll it back if anything goes wrong. You can upgrade a full application and roll back if anything goes wrong. No more partial-state headaches.

Therefore I'm not sure what "versioned tables" mean. Have you tried PostgreSQL's temporal tables extension, which seems to do what you mean?

elmigranto · on Sept 26, 2016

> incremental/differential backups

What's wrong with WAL backup? Postgres can run your script and pass it path to newly created WAL chunk, that you can ship off anywhere.

Or run `pg_start_backup`, rsync data directory, `pg_stop_backup`.

clg2 · on Sept 26, 2016

You would back up the WAL as well.

However there might be a lot of WAL to apply, as data gets added then deleted or changed.

A differential backup (see Note) solves this somewhat for large databases by just backing up the blocks changed since the last full backup.

So you might do a full backup once a week, and differentials nightly in between and also save the WAL. This means you are not backing up many TBs every night.

To recover, you would restore the full backup, plus the last differential backup plus any WAL since the last differential. This reduces the volume of WAL to be applied in a recovery.

Note :

Differential backups would include all the changes since the last FULL backup.

Incremental backups would include changes since the last backup, whether a full or the last incremental.

dgregd · on Sept 26, 2016

> What's wrong with WAL backup?

I use PG for 10 years and I never needed incremental backups. But if someone asked me about it then I would respond that the PG has no incremental backups.

So WAL backup is not well known or documented in my opinion. An average db admin needs a tool which will generate incremental backup as a single file named like dbname_20160926.sql

SEJeff · on Sept 26, 2016

Barman[1] gives you something better than incremental backups actually. It sets up the backup server as a database slave via PG's built-in replication. It gives you point in time recovery with the nicest CLI of any backup tool for any DB I've ever used.

If you think backup tools are a minus for PG, I guess you've never used barman. Easy as pie and has saved my bacon several times when CASCADE DELETE wrecked a few dbs for me due to coworkers futzing around.

[1] http://www.pgbarman.org/

dgregd · on Sept 26, 2016

> If you think backup tools are a minus for PG, I guess you've never used barman

You are right, I've never used barman. But I have streaming replication configured manually. Generally I am quite happy with PG backup tools.

However this discussion made me to think about pg_dump deficiencies. WAL backup is for whole db cluster. But how do I incrementally update one db? E.g. one prod db to my dev environment.

SEJeff · on Sept 26, 2016

Indeed, that is something that WAL files don't work for. There aren't a lot of great answers for that. I generally will restore a point in time snapshot somewhere and then dump out that entire db to restore via sql or whatnot.

MustardTiger · on Sept 26, 2016

>So WAL backup is not well known or documented in my opinion

What? It is the standard backup method clearly explained in the docs.

technion · on Sept 26, 2016

The corollary to the backup solution is that if Postgresql ever ended up operating in a similar fashion to Oracle's rman, particularly on its encouraged ASM, Postgresql would very quickly start seeing "Postgresql requires a full time DBA, just use MariaDB" type blogs. I can see how the middle ground is a hard problem.

clg2 · on Sept 26, 2016

I agree, Postgres having its own storage subsystem would be wrong. Fortunately the PG philosophy is to leave what the OS does best to the OS, so its unlikely that situation would arise.

However, a middle ground of being able to backup just the changes in a multi-terabyte database that have been made since the last full backup, wouldn't be controversial and is a common use case.

I've read of a few attempts on the PG dev mailing list over the last couple of years, so hopefully one of them will be completed sometime soon and merged into core.

Some of the work I have come across :

https://github.com/ossc-db/pg_rman

https://github.com/2ndquadrant-it/barman/issues/21

randallsquared · on Sept 26, 2016

> Postgresql would very quickly start seeing "Postgresql requires a full time DBA, just use MariaDB"

Yeah, welcome to 15 years ago, when that was the common wisdom.

X86BSD · on Sept 26, 2016

3) ZFS with snapshots and send/receive are what I use for local and offsite backups. They can be as fine grained as you like time wise, and are so so so much easier to manage and restore vs traditional backups.

takeda · on Sept 26, 2016

This is probably easier, but it might be an issue, when you might want to skip some transactions (someone for example deletes a table). Make sure you also back up WALs.

marcosdumay · on Sept 26, 2016

You can very likely get any of those implemented for you by the price of a largish site license pf Oracle.

phonon · on Sept 26, 2016

There is https://github.com/arkhipov/temporal_tables which seems okay.

Of course, it would be nice if Postgres natively supported the full SQL:2011 standard...

thristian · on Sept 26, 2016

> Parallel execution of sequential scans, joins and aggregates

This confuses me—I thought a sequential scan was slow because it was limited by disk I/O; why is it faster to hawe multiple threads waiting around for the same disk?

(I guessed maybe that it was running different sequential scans in the same query in parallel, but no, pages 3 and 4 of the PDF show parallel workers being used for a single scan of a single table)

dunkelheit · on Sept 26, 2016

> I thought a sequential scan was slow because it was limited by disk I/O

Not necessarily. Suppose you have a fairly big table that fits in the shared buffer cache and you run a query with a predicate that returns only a handful of rows. Then you would be better off scanning parts of it in parallel and then combining the results.

mianos · on Sept 26, 2016

It is worth checking out the wiki on the details of the parallel query for how and when it works: https://wiki.postgresql.org/wiki/Parallel_Query

amitlan · on Sept 27, 2016

or read this chapter (the wiki page's content made into the official documentation): https://www.postgresql.org/docs/devel/static/parallel-query....

macdice · on Sept 26, 2016

Parallel sequential scan is a building block that enables other work to be done in parallel. Once you've got a way to split up a sequential scan by handing chunks to different workers, those workers can then do other CPU intensive work like filtering (involving potentially expensive expressions), partial aggregation (for example summing a subtotal that will be added to other workers' subtotals at the end) and joining (say against a hash table).

gaius · on Sept 26, 2016

Because "the disk" is the wrong abstraction here; think instead of stuffing I/O requests into a queue, letting the storage controller re-order them, then giving you a callback when the blocks are ready.

katnegermis · on Sept 26, 2016

I was wondering the exact same thing -- how can CPU-parallelism improve on a task which (to me at least) seems to be disk-IO bound? I'm guessing that my assumption of the task being disk-IO bound is incorrect!

jaziek · on Sept 26, 2016

It's only disk IO bound if you're accessing the disk. With modern servers packing multiple TB of memory, it is likely you won't be accessing the disk at all.

postila · on Sept 27, 2016

For some cases (like single HDD and low amount of RAM) it should be so, but for some cases(from "the whole table is already loaded to RAM" to "SSD" and to "a table is partitioned among multiple drives"), it must give very good improvements for analytical queries.

_pmf_ · on Sept 26, 2016

> why is it faster to have multiple threads waiting around for the same disk?

RAID?

merb · on Sept 26, 2016

Wow the Version 10 feature's are exciting. Really.

brightball · on Sept 26, 2016

Where did you see those?

travisby · on Sept 26, 2016

the (second) last slide were 'possible 10 features'

dsavinkov · on Sept 26, 2016

I am really glad to see "searching for phrases" feature for Full-Text Search, as doing that with POSIX was really painful operation for long text area fields.

dragon_king · on Sept 26, 2016

What does 'doing that with POSIX' mean? I have read this many times but I do not have a clear understanding.

dsavinkov · on Sept 27, 2016

I meant using something like t.long_text_field ~~ '\mexact phrase\M' in actual sql statement - https://www.postgresql.org/docs/9.5/static/functions-matchin...

trungaczne · on Sept 27, 2016

I think what he means is "doing it using Unix command line utilities", like grep.