> 9: Compare indexed columns with IS NOT DISTINCT FROM
Does anybody know why this is the case? Usually, an index is not used if the semantics of the index do not match the semantics of the query, so "using" it cannot ever produce correct results. But the workaround presented seems to have identical semantics to IS DISTINCT FROM and still uses the index, so why isn't IS DISTINCT FROM using the index then?
I asked this specific question before on the PostgreSQL IRC, and was told that it simply wasn't implemented. There's no huge technical blocker to it being done, it's just a bit awkward to make it work the way the code is structured AIUI.
My first guess would be that IS NOT DISTINCT FROM considers nulls to match. Indexes might not contain null values. I don't know of postgres indexes are sparse or can be non-sparse? It would be hard to do use a sparse index and find null matches. I would expect this could be optimized away if the column is also not null, but then why would you use the operator in that case so maybe no one has handled it.
Indices in Postgres do contain NULL values. My guess is that it's possible to make IS DISTINCT FROM an indexable operator, but it wouldn't be straightforward given that NULLs are a bit of a special case for indexing comparisons in the sense that they totally ignore data types. IS DISTINCT FROM would probably have to redefine a bunch of existing opclasses to account for the possibility of NULL.
Hmm - actually read some docs and I think that the implementation of IS NOT DISTINCT FROM was originally implemented one way when nulls were not indexed, but then hasn't been fixed since null indexing was implemented.
That might be the case, but my experience with databases (and especially PostgreSQL) is that most of the time I actually misunderstood the exact semantics of either the operation or the index. That would be a good chance to learn something :)
Does anybody know why this is the case? Usually, an index is not used if the semantics of the index do not match the semantics of the query, so "using" it cannot ever produce correct results. But the workaround presented seems to have identical semantics to IS DISTINCT FROM and still uses the index, so why isn't IS DISTINCT FROM using the index then?