Hacker News new | past | comments | ask | show | jobs | submit login

You are right that indexes are usually (though not always!) implemented as a tree. But in practice the number of levels of that tree is usually fixed, so in practice it is constant. (And the number of writes to disk per write to the index averages under 1 - the top levels generally get overwritten multiple times before they have to hit disk on a checkpoint, and sometimes the leaf gets overwritten again.)



But that would mean you would have to make it wider and wider. That cannot be done at O(1) time, can it? Or are they just starting a second, third, etc. tree whenever one fills up? If so, what is the benefit over adding another level? More localized disk I/O during writing? Do you have a reference?


http://www.stanford.edu/class/cs276a/projects/docs/berkeleyd... describes the performance of BTree data structures in real world databases. (It is discussing a standalone key/value store, but the internals of a database index work the same way.)

As noted, 3 levels suffices for a moderate sized table, and it would be rare to need more than 5 levels. The "levels" in question are pages of memory that have to be looked at. The reason that this is less than you expect is that inside of each page you essentially have a tree that has several levels to traverse. However the time to traverse that tree is less than the time to fetch the next page, so the operations of concern are fetches of pages.

The most important number noted in that link is that under normal operation once caches are warmed up, it should a maximum of one disk seek to traverse the index. That single disk seek, if it happens, takes longer than all other operations.

The long and short of it is, "in theory log(n), in practice a reasonably small constant."


Thanks. But a disk seek is not a small constant. I would almost describe that as "in theory log(n), in practice a disk seek, so half the speed" (assuming that the record can be written with one disk seek)


A hash index will also require one disk seek which makes b-tree indexes and hash indexes have more similar performance than the theoretical O(log n) vs O(1) would indicate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: