Understanding Postgres Performance

btilly · on Oct 2, 2012

I would be more cautious about indexes than this article is.

The trade-off with indexes is that they make reads faster and writes slower. But if you're running a high performance application, it is not that hard to have read-only slaves to offload read queries to. (I won't say that it is trivial, because there are issues, but it is doable. I've done it.) However fixing poor write performance is much more complicated.

Therefore create all of the indexes that you need. But no more. And any indexes that are not carrying their weight should be dropped. Furthermore if you have intensive queries that you have to do (eg for reports), consider offloading those to another database.

Of course if you don't have significant data/volume/etc, then don't worry about this. Any semi-sane approach will work for you. (That applies to most of you.)

silvestrov · on Oct 2, 2012

Ditto for tables where you need a significant part of the data, e.g. 25%. If all queries need that amount of data, an index is just a waste.

If the tables are only queried once in a blue moon but written to very often (e.g. events/history/logs) then indexes can be wasteful.

btilly · on Oct 2, 2012

The definition of "significant part" is highly hardware dependent. Several years ago the cutoff point was generally somewhere in the 5-10% range, but I'm sure that has changed.

Also if you're creating indexes, the most powerful underutilized ability is to create concatenated indexes. Index 2 or 3 fields in one index. Those indexes can be used if you are querying the first field only, or the first 2, or the first 3. (Some databases will use them if you are querying the second, but that is much, much less efficient to do.) If you're looking things up by 2 fields sometimes, a concatenated index can really save time, and they are nearly as good as an index on a single field for queries that only need one field.

andrewcooke · on Oct 2, 2012

am i wrong in thinking that a missing index can change an O(logn) query to an O(n), while an additional index is going to add a constant cost to writes?

because if that's correct then it makes sense to err on the side of too many indices.

of course, it is better to be perfect. but if you have an error, which side would you prefer to be on?

and you don't need "significant" volumes to hurt with a full table scan (ie it will hurt most of you).

[what surprised me was the 99% cache. that is very dependent on your application, but i guess is common for web apps?]

chanks · on Oct 2, 2012

One thing people don't mention that often with Postgres, but is worth some attention, is the Heap-Only Tuples (HOT) optimization that was added in 8.3. During an UPDATE, Postgres will try to fit the new version of the row on the same page as the old one - if it's able to do this, it means the indices don't have to be updated at all.

So if you're expecting a table to have a heavy UPDATE load it may be a good idea to lower its fillfactor from the default of 100 (for example, ALTER TABLE foo SET (fillfactor = 95);). This means that once a page is 95% full no new rows will be inserted into it, and the remaining 5% will be available for updated rows. HOT will also do "mini-vacuums" on individual pages as needed to free up space.

Of course, it's up to you to decide whether this tradeoff is worth increasing the effective size of your table by ~5%. But even tables with fillfactors at the default of 100 will get some benefit from HOT.

More info: https://github.com/postgres/postgres/blob/master/src/backend...

Edit to add: One caveat is that HOT won't work if you UPDATE a column that is indexed - then, of course, Postgres will have to touch the index.

joevandyk · on Oct 2, 2012

It would be nice to have a program that analyzed postgresql statistics and queries and made observations/recommendations. This might be much easier to do with the new pg_stat_statements extension in 9.2.

lmm · on Oct 2, 2012

If n is your number of rows, adding a constant cost to each write is an O(n) total penalty. But the asymptotic performance isn't really the point for many real-world workloads (e.g. an O(1) hashtable lookup is generally much slower than an O(n) linked list traversal). As the grandparent says, read performance problems are generally easier to work around than write performance ones, and reporting aggregation reads (which are the ones that tend to require table scans) are often far less time-sensitive than writes.

craigkerstiens · on Oct 2, 2012

The 99% cache recommendation is absolutely for web applications. Most web applications will want queries to return in less than 10 ms and are only retrieving a small amount of data. With data warehousing and reporting applications this definitely changes.

btilly · on Oct 2, 2012

You are correct. However high performance applications are usually not brought down by occasional slow operations, but rather by a myriad of more expensive than needed ones.

Someone · on Oct 2, 2012

I do not think an extra index adds a constant cost to writes. If it is an index that allows for range queries (I haven't seen 'equality-only' indices, but that probably is because I haven't looked hard enough), writing an index entry is (or should be) like an insert into a sorted dictionary. That will be O(logn).

btilly · on Oct 2, 2012

You are right that indexes are usually (though not always!) implemented as a tree. But in practice the number of levels of that tree is usually fixed, so in practice it is constant. (And the number of writes to disk per write to the index averages under 1 - the top levels generally get overwritten multiple times before they have to hit disk on a checkpoint, and sometimes the leaf gets overwritten again.)

Someone · on Oct 2, 2012

But that would mean you would have to make it wider and wider. That cannot be done at O(1) time, can it? Or are they just starting a second, third, etc. tree whenever one fills up? If so, what is the benefit over adding another level? More localized disk I/O during writing? Do you have a reference?

btilly · on Oct 3, 2012

http://www.stanford.edu/class/cs276a/projects/docs/berkeleyd... describes the performance of BTree data structures in real world databases. (It is discussing a standalone key/value store, but the internals of a database index work the same way.)

As noted, 3 levels suffices for a moderate sized table, and it would be rare to need more than 5 levels. The "levels" in question are pages of memory that have to be looked at. The reason that this is less than you expect is that inside of each page you essentially have a tree that has several levels to traverse. However the time to traverse that tree is less than the time to fetch the next page, so the operations of concern are fetches of pages.

The most important number noted in that link is that under normal operation once caches are warmed up, it should a maximum of one disk seek to traverse the index. That single disk seek, if it happens, takes longer than all other operations.

The long and short of it is, "in theory log(n), in practice a reasonably small constant."

Someone · on Oct 3, 2012

Thanks. But a disk seek is not a small constant. I would almost describe that as "in theory log(n), in practice a disk seek, so half the speed" (assuming that the record can be written with one disk seek)

jeltz · on Oct 5, 2012

A hash index will also require one disk seek which makes b-tree indexes and hash indexes have more similar performance than the theoretical O(log n) vs O(1) would indicate.

cafard · on Oct 2, 2012

The problem you can run into, and this will depend on the state of your optimizer, is that you may have a join, and the presence of the index may give you a nested-loops path when a hash join would be more efficient.

jeltz · on Oct 2, 2012

The PostgreSQL optimizer is cost based and uses statistics to choose join method. So that problem is quite rare and is best debugged using EXPLAIN/EXPLAIN ANALYZE.

krat0sprakhar · on Oct 2, 2012

Slightly OT: I've been using MySQL for quite some time now and whenever I start a new project thinking that I'll use Postgres this time I find myself lazily opting for something that I know and going back to MySQL. Is there a Postgres guide that is concise, easy to follow and illustrates its advantages over other relational databases. The Postgres guide[1] is too sparse currently and the beginner Apress book[2] mentioned is 644 pages long.

1- http://www.postgresguide.com/

2 - http://www.amazon.com/gp/product/1590594789

mattyb · on Oct 2, 2012

The recently released 'PostgreSQL: Up and Running' may be right up your alley, at only 168 pages:

http://shop.oreilly.com/product/0636920025061.do

atsaloli · on Oct 2, 2012

http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL MySQL vs PostgreSQL - an objective comparison

craigkerstiens · on Oct 2, 2012

What content would you like to see within Postgresguide?

jcampbell1 · on Oct 2, 2012

Probably a good idea to add a bit about serial, and sequences. This tends to confuse mysql people like myself.

I for instance don't know what happens when I copy or dump and restore a postgres table that relies on a sequence. Mysql auto-magically takes care of it by saving the next auto increment id in table dump or copy.

joevandyk · on Oct 2, 2012

If you use pg_dump to dump/restore the table, it will reset the sequence to the correct value. Look at the dump file, there will be a setval of the sequence.

jeltz · on Oct 2, 2012

Agreed, especially since sequences is a more powerful tool with a wider use than auto increment. For example you can use the same sequence for multiple tables, or use them without being connected to any table.

CCs · on Oct 2, 2012

I really liked the NewRelic graph and I wanted to know if I can get it running on my setup: Scala, sbt, liftweb, Jetty.

It worked! Just downloaded the ZIP, unpacked and started: sbt ~container:start -J"-javaagent:/path/myapp/newrelic/newrelic.jar"

Now I have a nice graph which matches up with the numbers I see in my logs. As I suspected there's some work that needs to be done on the DB access side.

Axsuul · on Oct 2, 2012

A better query to see how indexes are used to avoid division by zero errors.

  SELECT 
    relname, 
    CASE 
      WHEN seq_scan = 0 THEN 0 
      ELSE 100 * idx_scan / (seq_scan + idx_scan) 
    END AS percent_of_times_index_used, 
    n_live_tup rows_in_table
  FROM 
    pg_stat_user_tables 
  ORDER BY 
    n_live_tup DESC;

madelfio · on Oct 2, 2012

I think the ratio expressions are slightly off. Using the queries from the article can result in negative ratios, which shouldn't be possible if we're measuring the percent of accesses that hit the buffer. Each access is either a cache hit or a read, not both. So instead of subtracting in the numerator, they should be added in the denominator:

  SELECT 
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit)  as heap_hit,
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
  FROM 
    pg_statio_user_tables;

rpq1480 · on Oct 2, 2012

Off topic: grok is now my new favorite word.

Someone · on Oct 2, 2012

New? It's over 50 years old: http://en.wikipedia.org/wiki/Grok

adamtj · on Oct 2, 2012

"new favorite word" not "favorite new word". Also, see http://xkcd.com/1053/

omni · on Oct 3, 2012

He said "my new favorite word," not "my favorite new word." You could have linked to your potentially-of-interest article without taking a jab at the original commenter.

edit: This is what I get for leaving pages open and coming back to them later.

djabatt · on Oct 2, 2012

Grok was a favorite word in the .com collapse days with marketing types. Just saying.

liotier · on Oct 2, 2012

Grok : favorite word since 1961 - and quite well documented... http://en.wikipedia.org/wiki/Grok