Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huh, that's actually kind of a worst case I didn't think about: Since you're doing a reverse table scan my "sequential access" detection doesn't kick in. If you do the same query but with a forward scan it should fetch roughly the same amount of data but only do like 5 HTTP requests since the request size doubles for every sequential access.

e.g.:

select country_code, long_name from wdi_country where rowid >= 164 order by rowid asc limit 100;



I solved a similar problem recently: given a stream of data, how should you choose packet size in an online way to minimize regret (a linear combination of spare capacity of last packet and total packets used).

Turns out doubling isn’t the best strategy. The optimal solution is actually to add a constant increment to packet size. How much depends on relative cost of the terms in the regret function.


> I’ve set the page size to 1 KiB for this database.

> That’s because I implemented a pre-fetching system that tries to detect access patterns through three separate virtual read heads and exponentially increases the request size for sequential reads.

> Since you're doing a reverse table scan my "sequential access" detection doesn't kick in.

You know, starting off with the default 4kB page size naturally adds some resistance to these kinds of failure cases. If the VFS isn't issuing many requests in parallel, I would think that setting up a page size near target_bandwidth * round_trip_time would be a better initial guess. 1kB would be appropriate for a pretty low latency-bandwidth product.


That's true, but it also means that random access will always use at least that amount of data even if it only has to fetch a tiny amount. I did a few (non-scientific) benchmarks on a few queries and 1kB seemed like an OK compromise.

And note that the request chunk size is bound to the SQLite page size, and to change that page size you have to rewrite the whole DB. So it can't be set on the fly unless you have multiple copies of the database.


1kb fits in most IP MTU sizes, so that seems reasonable.


Do most HTTP responses have less than ~500 bytes of headers? I guess specifically here, GH pages' responses.

It looks like one of the requests made to the DB included a little over 700 bytes of response status line and headers, so that would probably end up spilling into more than one response packet, unfortunately.


With http/3 header compression, I assume the answer is yes.


Hah, I thought "in reverse order by ID" might be a stress test but I was still very impressed by how it performed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: