Hacker News new | past | comments | ask | show | jobs | submit login
Major features of PostgreSQL 9.6 [pdf] (momjian.us)
126 points by ioltas on Sept 26, 2016 | hide | past | favorite | 35 comments



I really would like to see bi-temporal table support eventually come to PG

https://wiki.postgresql.org/images/6/64/Fosdem20150130Postgr...

Is anybody aware somebody working on that for PG?


> I really would like to see bi-temporal table support eventually come to PG

> Is anybody aware somebody working on that for PG?

Not that I am aware of.

Postgres is a fantastic database, and with zero licensing headaches, but when people ask why companies are still paying for commercial DBMS engines, some features that they have which Postgres does not have, and which I often miss are :

1)"temporal" / versioned tables / time consistent views of data . Oracle has a lot of "flashback" related options at the DB and table levels which can be really useful.

2) Easy, built-in partitioning for large datasets. (This is being worked on and will probably start to be released in Postgres 10).

3) Better backup options for large databases, such as incremental/differential backups.

In the Yandex migration from Oracle to Postgres I recall that this, ie incremental backups, is one of the things they mentioned it would be great to have.

4) Along with backups, the built-in recovery options in Postgres are fairly basic compared to what you can do regarding single table recovery, corruption repair etc that other DBs have.


> why companies are still paying for commercial DBMS engines

In my experience in 90% of cases application code migration to a new db engine is the real road block.

I've seem many cases where even migration to a new Oracle version is a huge problem because of required app code review and full user acceptance tests. So companies pay licence fees for old unsupported versions. What is even more absurd.

And frankly speaking in many cases of simple apps even SQLite has all necessary features in replace Oracle.


> 1)"temporal" / versioned tables / time consistent views of data

Oracle doesn't have transactional DDL. It means if you drop a column or create a table, it commits your transaction automatically.

This is huge, because with PostgreSQL you can import a full database in a single transaction, and roll it back if anything goes wrong. You can upgrade a full application and roll back if anything goes wrong. No more partial-state headaches.

Therefore I'm not sure what "versioned tables" mean. Have you tried PostgreSQL's temporal tables extension, which seems to do what you mean?


> incremental/differential backups

What's wrong with WAL backup? Postgres can run your script and pass it path to newly created WAL chunk, that you can ship off anywhere.

Or run `pg_start_backup`, rsync data directory, `pg_stop_backup`.


You would back up the WAL as well.

However there might be a lot of WAL to apply, as data gets added then deleted or changed.

A differential backup (see Note) solves this somewhat for large databases by just backing up the blocks changed since the last full backup.

So you might do a full backup once a week, and differentials nightly in between and also save the WAL. This means you are not backing up many TBs every night.

To recover, you would restore the full backup, plus the last differential backup plus any WAL since the last differential. This reduces the volume of WAL to be applied in a recovery.

Note :

Differential backups would include all the changes since the last FULL backup.

Incremental backups would include changes since the last backup, whether a full or the last incremental.


> What's wrong with WAL backup?

I use PG for 10 years and I never needed incremental backups. But if someone asked me about it then I would respond that the PG has no incremental backups.

So WAL backup is not well known or documented in my opinion. An average db admin needs a tool which will generate incremental backup as a single file named like dbname_20160926.sql


Barman[1] gives you something better than incremental backups actually. It sets up the backup server as a database slave via PG's built-in replication. It gives you point in time recovery with the nicest CLI of any backup tool for any DB I've ever used.

If you think backup tools are a minus for PG, I guess you've never used barman. Easy as pie and has saved my bacon several times when CASCADE DELETE wrecked a few dbs for me due to coworkers futzing around.

[1] http://www.pgbarman.org/


> If you think backup tools are a minus for PG, I guess you've never used barman

You are right, I've never used barman. But I have streaming replication configured manually. Generally I am quite happy with PG backup tools.

However this discussion made me to think about pg_dump deficiencies. WAL backup is for whole db cluster. But how do I incrementally update one db? E.g. one prod db to my dev environment.


Indeed, that is something that WAL files don't work for. There aren't a lot of great answers for that. I generally will restore a point in time snapshot somewhere and then dump out that entire db to restore via sql or whatnot.


>So WAL backup is not well known or documented in my opinion

What? It is the standard backup method clearly explained in the docs.


The corollary to the backup solution is that if Postgresql ever ended up operating in a similar fashion to Oracle's rman, particularly on its encouraged ASM, Postgresql would very quickly start seeing "Postgresql requires a full time DBA, just use MariaDB" type blogs. I can see how the middle ground is a hard problem.


I agree, Postgres having its own storage subsystem would be wrong. Fortunately the PG philosophy is to leave what the OS does best to the OS, so its unlikely that situation would arise.

However, a middle ground of being able to backup just the changes in a multi-terabyte database that have been made since the last full backup, wouldn't be controversial and is a common use case.

I've read of a few attempts on the PG dev mailing list over the last couple of years, so hopefully one of them will be completed sometime soon and merged into core.

Some of the work I have come across :

https://github.com/ossc-db/pg_rman

https://github.com/2ndquadrant-it/barman/issues/21


> Postgresql would very quickly start seeing "Postgresql requires a full time DBA, just use MariaDB"

Yeah, welcome to 15 years ago, when that was the common wisdom.


3) ZFS with snapshots and send/receive are what I use for local and offsite backups. They can be as fine grained as you like time wise, and are so so so much easier to manage and restore vs traditional backups.


This is probably easier, but it might be an issue, when you might want to skip some transactions (someone for example deletes a table). Make sure you also back up WALs.


You can very likely get any of those implemented for you by the price of a largish site license pf Oracle.


There is https://github.com/arkhipov/temporal_tables which seems okay.

Of course, it would be nice if Postgres natively supported the full SQL:2011 standard...


> Parallel execution of sequential scans, joins and aggregates

This confuses me—I thought a sequential scan was slow because it was limited by disk I/O; why is it faster to hawe multiple threads waiting around for the same disk?

(I guessed maybe that it was running different sequential scans in the same query in parallel, but no, pages 3 and 4 of the PDF show parallel workers being used for a single scan of a single table)


> I thought a sequential scan was slow because it was limited by disk I/O

Not necessarily. Suppose you have a fairly big table that fits in the shared buffer cache and you run a query with a predicate that returns only a handful of rows. Then you would be better off scanning parts of it in parallel and then combining the results.


It is worth checking out the wiki on the details of the parallel query for how and when it works: https://wiki.postgresql.org/wiki/Parallel_Query


or read this chapter (the wiki page's content made into the official documentation): https://www.postgresql.org/docs/devel/static/parallel-query....


Parallel sequential scan is a building block that enables other work to be done in parallel. Once you've got a way to split up a sequential scan by handing chunks to different workers, those workers can then do other CPU intensive work like filtering (involving potentially expensive expressions), partial aggregation (for example summing a subtotal that will be added to other workers' subtotals at the end) and joining (say against a hash table).


Because "the disk" is the wrong abstraction here; think instead of stuffing I/O requests into a queue, letting the storage controller re-order them, then giving you a callback when the blocks are ready.


I was wondering the exact same thing -- how can CPU-parallelism improve on a task which (to me at least) seems to be disk-IO bound? I'm guessing that my assumption of the task being disk-IO bound is incorrect!


It's only disk IO bound if you're accessing the disk. With modern servers packing multiple TB of memory, it is likely you won't be accessing the disk at all.


For some cases (like single HDD and low amount of RAM) it should be so, but for some cases(from "the whole table is already loaded to RAM" to "SSD" and to "a table is partitioned among multiple drives"), it must give very good improvements for analytical queries.


> why is it faster to have multiple threads waiting around for the same disk?

RAID?


Wow the Version 10 feature's are exciting. Really.


Where did you see those?


the (second) last slide were 'possible 10 features'


I am really glad to see "searching for phrases" feature for Full-Text Search, as doing that with POSIX was really painful operation for long text area fields.


What does 'doing that with POSIX' mean? I have read this many times but I do not have a clear understanding.


I meant using something like t.long_text_field ~~ '\mexact phrase\M' in actual sql statement - https://www.postgresql.org/docs/9.5/static/functions-matchin...


I think what he means is "doing it using Unix command line utilities", like grep.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: