Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves

crux_ · on Nov 9, 2009

Addenda & thoughts since I do a lot of work along similar lines:

It's probably not a bad idea to have the option of a spatial index at the leaves of a geohash, as the geohash is necessarily of limited precision and will behave badly if someone inserts 100k points all within a 1x1 foot area. ;) (This is equivalent, roughly, to a plain quadtree only with the first N levels of the tree pre-calculated and cached locally.)

I'm guessing the riddle at the end ("Does it remind you of anything?") would be gray code.

Quad-trees can be used for polygons, although they are less than ideal, by storing a list of polygons for each quad. If a quad is entirely contained by the polygon, add the polygon to its list; if a quad is part in and part out of the polygon, add it to its list if length(poly-list) < heuristic_value; otherwise split the quad and redistribute its list among its leaves. (The same arrangement can be used for line segments only, too.) The "query for a region" image shows how this would work, just change the caption to "insert a region." ;)

A reference for this type of index, which is called a PMR-quadtree: http://www.cs.umd.edu/~hjs/pubs/NelsSIGG86.pdf

Another good option for points is a KD-tree. You can't turn it into a hash, at least not straightforwardly, but it gives you some nice queries like n-nearest-neighbours, and can be done with as many dimensions as you like rather than just two. (Example use: accelerating the k-means clustering of big multivariate datasets.)

rbranson · on Nov 10, 2009

Quadtrees are actually actively used for spatially indexing polygons (in shapefiles) by the OGR tools (part of the GDAL package) and MapServer. Quadtrees work quite well in MapServer because it's primary use case is as a read-only renderer of shapefiles. In this scenario, rebuilding the index from scratch each time you make a change is trivial.

martian · on Nov 9, 2009

This is a great write-up.

For n-dimensional Hilbert curves check out http://www.tiac.net/~sw/2008/10/Hilbert/

EricBurnett · on Nov 9, 2009

That was a good explanation to calculating the index of a point on the Hilbert curve - I had been wondering how that was done myself. I'm going to have to sit down and implement one of these someday.

Other than the quadtree, the Hilbert curve can be used to optimize the r-tree as well: http://en.wikipedia.org/wiki/Hilbert_R-tree .

codexon · on Nov 10, 2009

You may be interested in my more detailed writeup of morton codes here that was briefly mentioned in this article. It also comes with Python code.

http://www.codexon.com/posts/morton-codes

wheaties · on Nov 9, 2009

Very interesting stuff. Our project has been required to work with the ESRI ArcObjects stuff for the past 4 years and I've been dying to get inside of their algorithms. Thanks for resparking my curiosity.

crux_ · on Nov 9, 2009

ESRI ArcObjects under the hood:

   if (random()) { crash(); }
   else return UndefinedError;

Tongue in cheek, of course, and they've gotten noticeably better since 4 years ago. (I've been working with ArcObjects since their 8.3 release... shudder.)

If you're interested in how they implement vector operations:

   - http://www.vividsolutions.com/jts/main.htm
   - http://trac.osgeo.org/geos/

As far as raster operations go, I think that GRASS does all they do, plus more. All of the above are nice open-source implementations so you can go as deep as you'd like. ;)

p.s. Most of the algorithms ESRI (and similar) use might not be spelled out in blog posts, but they certainly are in textbooks. ;)

p.p.s. Out of curiosity, what have you been working on? ESRI tech is about as anti-startup as one can imagine. It's like using Oracle for your startup's database.

wheaties · on Nov 11, 2009

Heh, I've been using ESRI since 9.0. Well, to be honest it's required by the contract. I joined when it got bought out.

memetichazard · on Nov 9, 2009

Interesting. I've been thinking about the 'Small World' problem for the Facebook puzzles and came up with the quadtree concept, but I never even imagined something like the geohash/Hilbert curve.

BearOfNH · on Nov 10, 2009

FTA: the next part of our position is 1, and the square we should consult next is the third one.

Doesn't he mean the second one (again)?

This is a good article, but he really should have added an 8x8 illustration and/or mentioned his coordinates start at (0,0) in the upper left. And maybe picked a different point so the example doesn't end up going from the first square to the second to (I believe incorrectly) the third.