Hacker News new | past | comments | ask | show | jobs | submit login

Exactly, intervals are disjoint and adjacent (automatically coalesced). The ordered map I'm using will coalesce them together which works pretty well. The problem is that I'd like to accelerate range queries even further, probably by excluding certain intervals with some acceleration structure on the side (e.g., skip any intervals with a duration less than X) at the cost of slightly more expensive updates.



Can you give any more details about the range queries? e.g. are you interested in finding any gap of at least duration X, or the number of bookings in a date range?

Put another way, are you more interested in what's been consumed, or what's free?


I'm interested in both, but quickly finding the nearest gaps of at least duration X is most important.


"Nearest gap" doesn't sounds like a range query to me. It sounds more like a nearest neighbor query. It sounds like you have (start time, duration) tuples and want to search for nearest neighbor on start time with an at-least-X constraint on duration. (start time, duration) is more point-like than interval-like (as far as data structures go), so anything that can handle nearest neighbor on point-like data would be a candidate. If this is an accurate mapping for your problem, you might check out KD trees. You'd probably have to write a custom tree traversal algorithm to query NN on one dimension and >X on the other, but I think it could be done. Sounds kinda fun.


I have (start time, duration) tuples but would like to find the nearest entries (in either direction) given a specific time and minimum duration.

Thanks for the suggestion! I've tried some other spatial acceleration structures (e.g., R-tree and some others) and applying them in 1-d but they didn't outperform the ordered map unfortunately. It could be interesting to try out KD trees at some point though.


Sure thing! I'd reframe this as a nearest neighbor search over (start time, duration) tuples. KD trees are what I'm most familiar with for that problem, but there are others that would fit. Check out chapters 1 ("Multidimensional Point Data") and 3 ("Intervals and Small Rectangles") of Foundations of Multidimensional and Metric Data Structures: https://books.google.com/books?id=vO-NRRKHG84C&pg=PR9&source...

R-trees are more generic in that they can store intervals, shapes, volumes, etc, but (start time, duration) tuples are point-like, and that opens more possibilities. The ordered map is going to give you good memory locality. It might be that you'll only beat that when searching for larger durations (that require you to iterate over more of your ordered map). There will probably be an inflection point somewhere, depending on the distribution of your data and queries.

Hope that helps!


There are so many nuances here, but perhaps you could maintain two coalescing structures: what's free and what's used, complements one one another. Adding to one means removing from the other. One might give you a more constrained search space than the other.

An ordered structure feels like the way to go, here, since you said "nearest gaps". Once you know the time you'll get locality in your search. You could also run the search in both directions on two threads.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: