You talk a lot about performance in this thread but I haven't seen any numbers. ...

grovesNL · 2024-07-24T01:06:43 1721783203

The rough numbers I’ve been using in my test cases are in the scale of thousands of relatively densely distributed intervals and hundreds of thousands of range queries. This is for a real-time interactive use case targeting the range of tens of milliseconds or so (obviously depends on CPU, but ideally supporting the last couple generations or so). “Too slow” is the upper end of that range because it’s part of the critical path for interactions.

Slower writes with a dense representation like that would be great. The problem I’ve been seeing in my prototypes is that querying the slots back tends to be about the same performance as the ordered map. Some people mentioned a variation on this idea that would store some coarser-grained statistics based on minimum duration in certain regions which might help a bit.

winrid · 2024-07-24T01:15:16 1721783716

Thanks. Also, interesting, so this all runs on the client?

grovesNL · 2024-07-24T02:14:20 1721787260

Right, this calculation runs on the client and assumes the clients are fast enough. This helps a lot with interactivity because the network stack isn’t involved at all.

It could be interesting to optionally offload some compute to much faster servers when it would be useful, but that introduces request/response latency overhead too.

Karliss · 2024-07-24T13:49:31 1721828971

Do you really need hundreds of thousands of range queries because you are trying to process hundreds of thousands of network requests, or trying to retrieve/transform hundreds of thousands of entries? Or are you actually attempting to answer a single more complex query and the hundreds of thousands is just you iterating over all potential candidates searching for suitable range?

If it is the later case instead of attempting to speedup hundreds of thousands queries, it might be better to create a proper structure so that you can replace them with single query that directly answers your real question. Well designed augmented search tree for a specific situation can directly answer quite complex read queries (and even perform modifications). Downside is is you will likely need to at least partially roll your own, since there are many different combinations you can make, not every one has of the shelf library or even a dedicated name. Providing a clean interface usually will result in hiding access to some of the implement details needed implement customization required for your specific operations.

For example when I was making a graph layout algorithm I had to answer a query "Given a mapping of position->value, what is the closest position to some position x with value at least y?". It could be done using single log(n) query, no iteration over ranges or values of x.

Assuming amount of queries is orders of magnitude higher than the items in the structures is do you really need a read/write data structure? In most situations I had with similar constraints it was acceptable to have a read only data structure. Meaning it only supported two operations Create(data) and Query. Instead of Create(empty),Insert,Query or Create(),Inserted,Modify,Query. Then overall process is create/fill the structure with all your data, answer the batch of all your queries. Not having to support dynamic inserts/modifications can give good improvement to constant factor (and the hidden non constant factor once you take into account stuff like memory cache behavior). Having a fixed non modifyable tree means you don't have to deal with as much dynamic memory allocations, pointer indirections and tree rebalancing. Not having to deal with self balancing tree stuff, also significantly simplifies the code making it a lot more practical to roll your own specialized tree which directly supports the queries you need.

One more trick that can help depending on your overall constraints is key remapping. If you need all your dataset ahead of time (or it's rarely modified between large batches of read queries), you can map your key range from (float, timestamps, 64bit integers or whatever your range position type is) to integers 0..n where n is the amount of unique positions in your dataset. To do the mapping just sort the values and do a binary search, no need for generic purpose dynamic dictionary. Same sorted array can be later used for mapping query positions. Do your own benchmarks to see how search in sorted array compares to hashmap for your specific constraints. Benefits is that it can make your overall structure more dense, get rid of more complex comparison operations in the tree structure, and in some situations it can allow you to do direct lookup by index in a flat array or the leaf layer of fixed nearly complete b-tree.