Huh, that's actually kind of a worst case I didn't think about: Since you're doi...

vladf · on May 2, 2021

I solved a similar problem recently: given a stream of data, how should you choose packet size in an online way to minimize regret (a linear combination of spare capacity of last packet and total packets used).

Turns out doubling isn’t the best strategy. The optimal solution is actually to add a constant increment to packet size. How much depends on relative cost of the terms in the regret function.

brandmeyer · on May 2, 2021

> I’ve set the page size to 1 KiB for this database.

> That’s because I implemented a pre-fetching system that tries to detect access patterns through three separate virtual read heads and exponentially increases the request size for sequential reads.

> Since you're doing a reverse table scan my "sequential access" detection doesn't kick in.

You know, starting off with the default 4kB page size naturally adds some resistance to these kinds of failure cases. If the VFS isn't issuing many requests in parallel, I would think that setting up a page size near target_bandwidth * round_trip_time would be a better initial guess. 1kB would be appropriate for a pretty low latency-bandwidth product.

phiresky · on May 2, 2021

That's true, but it also means that random access will always use at least that amount of data even if it only has to fetch a tiny amount. I did a few (non-scientific) benchmarks on a few queries and 1kB seemed like an OK compromise.

And note that the request chunk size is bound to the SQLite page size, and to change that page size you have to rewrite the whole DB. So it can't be set on the fly unless you have multiple copies of the database.

pjc50 · on May 2, 2021

1kb fits in most IP MTU sizes, so that seems reasonable.

kelnos · on May 3, 2021

Do most HTTP responses have less than ~500 bytes of headers? I guess specifically here, GH pages' responses.

It looks like one of the requests made to the DB included a little over 700 bytes of response status line and headers, so that would probably end up spilling into more than one response packet, unfortunately.

londons_explore · on May 5, 2021

With http/3 header compression, I assume the answer is yes.

simonw · on May 2, 2021

Hah, I thought "in reverse order by ID" might be a stress test but I was still very impressed by how it performed!