When network is faster than browser cache (2020)

neomantra · on June 29, 2022

Different anecdote, but similar vibe....

In ~2010, I was benchmarking Solarflare (Xilinx/AMD now) cards and their OpenOnload kernel-bypass network stack. The results showed that two well-tuned systems could communicate faster (lower latency) than two CPU sockets within the same server that had to wait for the kernel to get involved (standard network stack). It was really illuminating and I started re-architecting based on that result.

Backing out some of that history... in ~2008, we started using FPGAs to handle specific network loads (US equity market data). It was exotic and a lot of work, but it significantly benefited that use case, both because of DMA to user-land and its filtering capabilities.

At that time our network was all 1 Gigabit. Soon thereafter, exchanges started offering 10G handoffs, so we upgraded our entire infrastructure to 10G cut-through switches (Arista) and 10G NICs (Myricom). This performed much better than the 1G FPGA and dramatically improved our entire infrastructure.

We then ported our market data feed handler's to Myricom's user-space network stack, because loads were continually increasing and the trading world was continually more competitive... and again we had a narrow solution (this time in software) to a challenging problem.

Then about a year later, Solarflare and it's kernel-compatible OpenOnload arrived and we could then apply the power of kernel bypass to our entire infrastructure.

After that, the industry returned to FPGAs again with 10G PHY and tons of space to put whole strategies... although I was never involved with that next generation of trading tech.

I personally stayed with OpenOnload for all sorts of workloads, growing to use it with containerization and web stacks (Redis, Nginx). Nowadays you can use OpenOnload with XDP; again a narrow technology grows to fit broad applicability.

scott_s · on June 29, 2022

I discovered something similar in the early to mid 2010s: processes on different Linux systems communicated over TCP faster than processes on the same Linux system. That is, going over the network was faster than on the same machine. The reason was simple: there is a global lock per system for the localhost pseudo-device.

Processes communicating on different systems had actual parallelism, because they each had their own network device. Processes communicating on the same system were essentially serialized, because they were competing with each other for the same pseudo-device. At the time, Linux kernel developers basically said "Yeah, don't do that" when people brought up the performance problem.

nine_k · on June 29, 2022

I wonder if creating more pseudo-devices for in-host networking would help. Doesn't Docker do that already, for other purposes?

kcexn · on June 29, 2022

That makes sense, Linux has many highly performant IPC options that don't involve the network device. Just the time it takes to correctly construct an ethernet frame is not negligible.

mcculley · on June 29, 2022

Processes communicating over TCP via localhost should not need ethernet frames.

neomantra · on June 30, 2022

OpenOnload supports user-space acceleration of pipes. For our custom applications, we initiate connections over unaccelerated UNIX sockets and then "upgrade" them to accelerated pipes. It's like setting up pairs of shared-memory queues, but all operated via POSIX calls including the ability to watch them with epoll.

This a real old example of this technique using boost::asio [1]. We adapted that from the one in our own reactor library to Onload-pipe-accelerate some OSS that used ASIO.

> Just the time it takes to correctly construct an ethernet frame is not negligible.

For that reason, Onload/TCPDirect (and other vendors too) have APIs that allow you to pre-allocate and prepare packets (headers and payload), then you can do some final tweaks -- like choosing a ticker and price in an order packet -- and then send it out with minimal latency.

[1] https://github.com/neomantra/asio-pipe-transport

jchw · on June 30, 2022

In fact, if I recall correctly, it doesn't seem to. I remember writing a tool that analyzed pcap dumps and initially assuming there would be Ethernet frames at the bottom, only to run into the fact that this wasn't true on lo, which had it's own frames, which I assume could vary by OS a bit.

account42 · on June 30, 2022

Processes communicating via localhost should also not need TCP.

mcculley · on June 30, 2022

There are many applications which are network transparent and would need extra code to know how to optimize when the other process happens to be on the same machine (e.g., using Sys V IPC or Unix domain sockets). There are good reasons to optimize the case of localhost.

I am always bemused when people imagine that communication over localhost involves packets hitting an ethernet adapter.

lenkite · on June 30, 2022

So, what method do you propose for communicating locally which will be faster than the network ? Unix sockets ? Pipes ?

easytiger · on June 30, 2022

> Unix sockets ? Pipes ?

It has been many years since i tested this but unix sockets seemed slower to me than loopback TCP even

If you need real speed you should use UDP/MC, or I have used a lot of commercial middlewares which used "shared memory" transports. i.e. messaging paradigm APIs wrapping various configurable messaging middlewares to provide a fairly unified interface to the application messaging concepts.

chmod775 · on June 30, 2022

>It has been many years since i tested this but unix sockets seemed slower to me than loopback TCP even

They're essentially the same... except they're not and there's a boatload of work in the middle that doesn't need to be done for unix domain sockets.

I can imagine some edge-cases however, where differences in stacks and configuration (in/out buffers vs. shared buffers, buffer sizes, packet loss vs. reliable, locking, blocking vs. non-blocking writes) may lead to more optimal scheduling under concurrent workloads. As soon as you have more than one thread doing something, there's a boatload of moving parts to consider.

But if you have everything properly tuned, unix domain sockets should win every time. If your results said something else, you probably weren't benchmarking what you thought you were benchmarking.

scott_s · on June 30, 2022

Shared memory of some kind will be fastest. It requires a re-architecture of the application, some of the times. Unfortunately, I can't find the Linux kernel mailing list response any more, but the kernel devs were not sympathetic to the use case of processes communicating over TCP on the same host.

oogali · on June 29, 2022

I went down a similar path (Solarflare, Myricom, and Chelsio; Arista, BNT, and 10G Twinax) and found we could get lower latency internally by confining two OpenOnload-enabled processes that needed to communicate with each other to the same NUMA domain.

We architected our applications around that as well. The firm continued on to the FPGA path though that was after my time there.

I do still pick up SolarFlare cards off of eBay for home lab purposes.

neomantra · on June 29, 2022

Nice! We deploy like that too. I document how to do that with Redis here [1].

[1] https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf401...

oogali · on June 29, 2022

Yup, all the same techniques we used (minus Docker but with the addition of disabling Intel Turbo Boost and hyper threading).

A few years ago I met with a cloud computing company who was following similar techniques to reduce the noisy neighbor problem and increase performance consistency.

Frankly, it’s good to see that there still are bare metal performance specialists out there doing their thing.

tomcam · on June 29, 2022

That was such a cool comment that I actually went to http://www.neomantra.com/ and briefly considered applying for a job there and un-retiring

throw6554 · on June 29, 2022

That reminds me of the when the openWRT and other open source guys were complaining that the home gateways of the time did not have a big enough CPU to max out the uplink (10-100mbps at the time), and instead built in hardware accelerators. What they did not know was that the hw accelerator was merely an even smaller CPU running a proprietary network stack.

wtallis · on June 29, 2022

That doesn't sound quite right to me. The worst CPU bottlenecks I recall for consumer routers running OpenWRT were mostly when doing traffic shaping (to compensate for bufferbloat in ISP equipment), not a complete inability to forward traffic fast enough. And when line speed really wasn't achievable with software routing, the hardware deficiency that mattered most was probably memory bandwidth rather than CPU core performance.

The kind of offload most prevalent on consumer routers is NAT performed by the Ethernet switch ASIC, which I don't think can be fairly described as an even smaller CPU running a proprietary network stack—you might have a microcontroller-class CPU core in the switch chip, but there's a separation of control and data planes.

easytiger · on June 30, 2022

I've worked quite extensively with openonload. There were significant numbers of "free wins" just from LD_PRELOAD though anything non trivial did require more configuration (mostly via env vars!). I've also written ef_vi applications for udp/mc pub in a similar market space to the one you describe. Whilst a little quirky in places you could write some incredibly fast stuff. on 2.xGHz xeon cpus you could send a useful datagram in 600-700ns. For a lot of applications you could even leave that in your critical path and not care.

d4a · on June 30, 2022

Seems like you did something similar to what Vector Packet Processing is.

https://en.wikipedia.org/wiki/Vector_Packet_Processing

jxy · on June 29, 2022

Does containerization have any impact on performance in your use cases?

neomantra · on June 29, 2022

I've not exhaustively characterized it, but RedHat has this comparison [1]. It is not enough for the scope/scale of my needs. I do still run bare metal workloads though.

That said, I have had operational issues in migrating to Docker, which is another sort of performance impact! I reference some of my core isolation issues in that Redis gist and this GitHub issue [2].

[1] https://www.redhat.com/cms/managed-files/201504-onload_conta... [2] https://github.com/moby/moby/issues/31086#issuecomment-30374...

staticassertion · on June 29, 2022

Since apparently no one is willing to read this excellent article, which even comes with fun sliders and charts...

> It turns out that Chrome actively throttles requests, including those to cached resources, to reduce I/O contention. This generally improves performance, but will mean that pages with a large number of cached resources will see a slower retrieval time for each resource.

pclmulqdq · on June 29, 2022

I am a systems engineer. I read the article title, then started reading the article, and realized it was a bait and switch. It is not about "network" vs "cache" in computer systems terms, which is what you might expect. It is about "network" vs "the (usually antiquated) file-backed database your browser calls a cache." The former would have been a compelling article. The latter is kind of self-evident: the browser cache is there to save bandwidth, not to be faster.

staticassertion · on June 29, 2022

I find it odd to call it a bait and switch when the first thing in the article is an outline.

> It is about "network" vs "the (usually antiquated) file-backed database your browser calls a cache."

It's actually nothing to do with the design of the cache itself as far as I can tell. If you finish reading the article you'll see that it's about a throttling behavior that interacts poorly with the common optimization advice for HTTP 1.1+, exposed by caching.

> The latter is kind of self-evident: the browser cache is there to save bandwidth, not to be faster.

I don't think that's something you can just state definitively. I suspect most people do in fact view the cache as an optimization for latency. Especially since right at the start of the article, the first sentence, the "race the cache" optimization is introduced - an optimization that is clearly for latency and not bandwidth purposes.

hinkley · on June 29, 2022

Memcached exists for two reasons: Popular languages that hit inflection points when in-memory caches exceeded a certain size, and network cards becoming lower latency than SATA hard drives.

The latter is a well known and documented phenomenon. The moment popular drives are consistently faster than network again, expect to see a bunch of people writing articles about using local or disk cache, recycling old arguments from two decades ago.

twotwotwo · on June 30, 2022

Agree with GP that this is getting pretty far off the article, but also--I think remote caches in datacenters are sometimes as much about making the cache shared as about the storage latency.

If you have a lot of boxes running your application, sometimes it's nice to be able to do an expensive thing once per expiry period across the cluster, not once per expiry per app server. Or sometimes you want a cheap simple way to invalidate or write a new value for the full cluster.

The throughput and latency of SSDs does make them a workable cache medium for some uses, with much lower cost/GB than RAM, but that can be independent of the local/remote choice.

staticassertion · on June 29, 2022

OK but that's got nothing to do with the post.

hinkley · on June 29, 2022

One, it’s covered in the article. Two, it has nothing to do with the post only if you ignore queuing theory.

When you throttle requests for IO then the latency is proportional to the duration of all of the previous queries.

pclmulqdq · on June 29, 2022

When I have worked on distributed systems, there are often several layers of caches that have nothing to do with latency: the point of the cache is to reduce load on your backend. Often, these caches are designed with the principle that they should not hurt any metric (ie a well-designed cache in a distributed system should not have worse latency than the backend). This, in turn, improves average latency and systemic throughput, and lets you serve more QPS for less money.

CPU caches are such a common thing to think about now that we have associated the word "cache" with latency improvements, since that is one of the most obvious benefits of CPU caches. It is not a required feature of caches in general, though.

The browser cache was built for a time when bandwidth was expensive, often paid per byte, the WAN had high latency, and disk was cheap (but slow). I don't know exactly when the browser cache was invented, but it was exempted from the DMCA in 1998. Today, bandwidth is cheap and internet latency is a lot lower than it used to be. From first principles, it makes sense that the browser cache, designed to save you bandwidth, does not help your website's latency.

Edit: In light of the changes in the characteristics of computers and the web, this article seems to mainly be an argument for disabling caching on high-bandwidth links on the browser side, rather than suggesting "performance optimizations" that might silently cost your customers on low-bandwidth links money.

kreetx · on June 29, 2022

Did you read the article?

Cache is "slow" because the number of ongoing requests-- including to cache! --are throttled by the browser. I.e the cache isn't slow, but reading it is waited upon and the network request might be ahead in the queue.

staticassertion · on June 29, 2022

The first three paragraphs you wrote are all very easily shown to be irrelevant. Again, the very first sentence of the post shows that the cache is designed to improve latency.

And as the article points out, the cache is in fact lower latency and does improve latency for a single request. The issue is with throttling.

The article is absolutely not suggesting disabling caching, nor does any information in the article indicate that that would be a good idea. Even with the issue pointed out by the article the cache is still lower latency for the majority of requests.

karmakaze · on June 29, 2022

It is a bait and switch. The issue is with local files and throttling, neither word appears in the title or outline.

Edit: I didn't need this post to tell me about the "waiting for cache" message I used to see with Chrome.

dchftcs · on June 29, 2022

When I see "browser cache" I can't think of anything other than a local file storage. Maybe it confused you, but there's no deliberate misleading.

karmakaze · on June 29, 2022

But it's not even in general it's about Chrome's specific implementation.

dchftcs · on June 30, 2022

The title doesn't necessarily imply it's general. For one, the first word is "When". If the title was instead "Network is faster than browser cache" you'd have more of a point.

citrin_ru · on June 29, 2022

> "network" vs "the (usually antiquated) file-backed database your browser calls a cache."

Firefox uses SQLite for on-disk cache backend. Not the latest buzzword, but not exactly antiquated. I expect a cache backed in Chrome to be at least as fast.

> cache is there to save bandwidth, not to be faster

In most cases a cache saves bandwidth and reduces page load time at the same time. Internet connection which is faster than a local SSD/HDD is a rare case.

cuteboy19 · on June 29, 2022

It is not immediately obvious why a local filesystem would be slower than network

roywashere · on June 29, 2022

On ‘recent’ files your operating system will most typically not even touch the disk because they will aggressively cache those files in memory

bee_rider · on June 29, 2022

Well. I mean, get a slow enough hard drive and a nice enough network and we can get there, haha.

pclmulqdq · on June 29, 2022

If you live in a rich country and are on fiber internet rather than a cable modem (or if your cable modem is new enough), you likely have better latency to your nearest CDN than you do to the average byte on an HDD in your computer. An SSD will still win, though.

The browser cache is kind of like a database and also tends to hold a lot of cold files, so it may take multiple seeks to retrieve one of them. Apparently it has throughput problems too, thanks to some bottlenecks that are created by the abstractions involved.

cogman10 · on June 29, 2022

Seems like a shortcut that shouldn't be.

I can understand throttling network requests, but disk requests? The only reason to do that would be for power savings (you don't want to push the CPU into a higher state as it loads up the data).

staticassertion · on June 29, 2022

I imagine it's just that the cache is in an awkward place where the throttling logic is unaware of the cache and the cache is not able to preempt the throttling logic.

cogman10 · on June 29, 2022

I'd guess the same thing is going on. Definitely FEELS like a code structure problem.

fnordpiglet · on June 29, 2022

Depends on if you value latency for the user. Saying I try both and choose the one coming first hurts no one but the servers not being protected by a client cache. But there’s absolutely no reason to believe a client has a cache that masks requests. AFAIK there’s no standard that says clients use caches for parsimony and not exclusively latency. As a matter of fact I think this is a good idea if it ever takes time to consult the cache, and the trade off is more bandwidth consumption which we are awash in. If you care that much run a caching proxy and use that and you’ll get the same effect of the client side cache masking requests. But I would say it’s superior because it always uses the local cache first and doesn’t waste user time on the edge condition in their cache coherency. It comes from Netscape which famously convinced everyone that it’s one of the hardest problems. That leads to the final benefit, the cache doesn’t have to cohere. If it’s too expensive at that moment to cohere and query then I can use the network source. Again the only downside is the network bandwidth is more consistently user. I would be hard pressed to believe most Firefox users already are grossly bandwidth over provisioned, and the amount of a fraction of a cable line a web browser loading from the cache no one could even notice that.

cogman10 · on June 29, 2022

> Depends on if you value latency for the user.

I do. Which is why it's silly to throttle the cache IO.

The read latency for disks is measured in microseconds. Is it possible for the server to be able to respond faster? Sure. However, if you aren't within 20 miles of the server then I can't see how it could be faster (speed of light and everything).

These design considerations will depend greatly on where you are at. It MIGHT be the case that eschewing a client cache in a server to server talk is the right move because your servers are likely physically close together. That'd mean the server can do a better job making multiple client requests faster through caching saving the memory/disk space required for the clients.

There is also the power consideration. It take a lot more power for a cell phone to handle a network request than it does to handle a disk request. Shouting into the ether isn't cheap.

fnordpiglet · on June 30, 2022

I think you’re misunderstanding the context of where Firefox works. Have you ever sat down at someone’s choking machine and wondered how on earth they’re seeing it hitting blocking IO because something’s misbehaving in the background drowning out the disks ability to service work? They don’t run in a well controlled server farm, but even there you can expect to see drives behaving poorly leading to io blocking, screwing up the cache performance which in well behaved environments works fine but in degraded environments means you defer to the network with a 0s timeout, meaning you might get no value from the local cache at all but the dual attempt minimizes the “decision time” by instead of waiting for a timeout then requesting it instead requests and goes with what comes back first. Even on a well behaved well managed hosts stuff can wake up and start indexing thinking it’s a good quiet time for that and for short periods you might end up with some slowdown in the cache behavior. Even on that normal multiprocess machine that’s well tuned you’ll sometimes see the cache slowing below the network.

The cost is the cache isn’t masking the remote infrastructures true demand. But there’s no spec that says it must afaik. I think the most interesting angle on this is that - infrastructure users serving data are paying for this latency optimization. Is that right? My view is this is the concerning thing - if widely adopted it could materially increase the cost to service providers some of whom operate on thin or negative margins, or with no income at all.

Spooky23 · on June 29, 2022

What happens when your 200 browser tabs are all looking for updates?

citrin_ru · on June 29, 2022

It makes sense to apply some throttling (and reduce CPU priority) for inactive tabs.

But the active tab when browser window is an active one should work without any throttling for lower response time.

cogman10 · on June 29, 2022

If your 200 browser tabs are saturating your 8gbps link to your m.2 ssd, what do you think they'll do to your 10mbps connection to the internet?

fnordpiglet · on June 30, 2022

What if it’s a virus scan or some background indexer browning out io resources on the disk path, or given it’s a consumer browser, some malware eating up resources unbeknownst to the user of the browser. Surely you’ve experienced noisy neighbors :-)

It’s an interesting approach because it removes the assumption browser caches mask volume for infrastructure purposes rather than user latency. I think you can overwhelm costs for low or no income services. If this is widely adopted we ask the providers of our servers to pay for that latency improvement.

cyounkins · on June 29, 2022

Worth noting that around the end of 2020, Chrome and Firefox enabled cache partitioning by eTLD+1 to prevent scripts from gaining info from a shared cache. This kills the expected high hit rate from CDN-hosted libraries. https://developer.chrome.com/blog/http-cache-partitioning/ https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P...

Thorrez · on July 1, 2022

>partitioning by eTLD+1

They were already partitioned by the requested eTLD+1, because they were already partitioned by the requested URL, which contains the requested eTLD+1. Chrome's change was additionally partitioning it by 2 more eTLD+1s: the top-level eTLD+1 and the frame eTLD+1. It looks like Firefox's change was to add 1 eTLD+1 to the original behavior: the top-level eTLD+1.

mobilio · on June 30, 2022

First this was implemented in Safari years before 2020.

sattoshi · on June 29, 2022

> This seemed odd to me, surely using a cached response would always be faster than making the whole request again! Well it turns out that in some cases, the network is faster than the cache.

Did I miss a follow up on this, or did it remain unanswered as to what the benefit of racing against the network is?

The post basically says that sometimes the cache is slower because of throttling or bugs, but mostly bugs.

Why is Firefox sending an extra request instead of figuring out what is slowing down the cache? It seems like an overly expensive mitigation…

8n4vidtmkvmk · on June 30, 2022

only reason i can think of is slow mechanical hard drive vs fast network. or if its based on etag and it has to hash the local resource, but it would be smart to cache that hash too. or maybe the os has too many file handles open and its doing a poor job of reading many files at once, whereas the network offers a 2nd tube to funnel data through...

oblak · on June 29, 2022

Well, yeah. Disk cache can take hundreds of MS to retrieve, even on modern SSDs. I had a handful of oddly heated discussions with an architect about this exact thing at my previous job. Showing him the network tab did not because he had read articles and was well informed about these things.

kccqzy · on June 29, 2022

At a previous job I worked on serving data from SSDs. I wasn't really involved in configuring the hardware but I believe they were good quality enterprise-grade SSDs. My experience was that a random read (which could be a small number of underlying reads) from mmap()'ed files from those SSDs took between 100 and 200 microseconds. That's far from your figure of hundreds of milliseconds.

Of course 200 microseconds still isn't fast. That translates to serving 5000 requests per second, leaving the CPU almost completely idle.

Another odd fact was that we in fact did have to implement our own semaphores and throttling to limit concurrent reads from SSDs.

TheDudeMan · on June 29, 2022

According to this, it can take 10ms (not 100s of ms) and only for a very large read (like 32MB). https://i.gzn.jp/img/2021/06/07/reading-from-external-memory... Full article: https://gigazine.net/gsc_news/en/20210607-reading-from-exter...

Retr0id · on June 29, 2022

This assumes it only takes a single disk read to locate and retreive an object from the cache (which is unlikely to be the case).

TheDudeMan · on June 29, 2022

OK, one small read plus one huge read would top-out at about 10.1ms, according to that graph.

Retr0id · on June 29, 2022

Also a big assumption (which we could verify by looking at the relevant implementations, but I'm not going to)

staticassertion · on June 29, 2022

Why is that unlikely to be the case?

Retr0id · on June 29, 2022

There's going to be at least one level of indirection, and at least some of those indirections are going to be on-disk.

TheDudeMan · on June 29, 2022

(Anyone who is not running their OS and temp space on NVME should not expect good performance. Such a configuration has been very cheap for several years now.)

zinekeller · on June 29, 2022

> Such a configuration has been very cheap for several years now.

This is a very weird comment, considering that a) it's cheaper than yesteryear but SATA SDD (or even modern magnetic HDDs) are still sold and are in active use and b) ignores phones completely, where a large number of sites would have mobile-dominated visitors and can't just switch to an NVMe-like performance even for those with large disposable incomes (because at the end of the day even with UFS phones are still slower than NVMe latency-wise).

staticassertion · on June 29, 2022

The issue has nothing to do with disk speed. If you had read the article you'd see a very nice chart that shows the vast majority of cache hits returning in under 2 or 3ms.

divbzero · on June 29, 2022

I wish I had a clearer memory or record of this, but I think I’ve also ~100ms for browser cache retrieval on an SSD. Has anyone else observed this and have an explanation? A sibling comment points out that SSD read latency should be ~10ms at most so the limitation must be in the software?

OP mentioned specifically that “there have been bugs in Chromium with request prioritisation, where cached resources were delayed while the browser fetched higher priority requests over the network” and that “Chrome actively throttles requests, including those to cached resources, to reduce I/O contention”. I wonder if there are also other limitations with how browsers retrieve from cache.

staticassertion · on June 29, 2022

> Has anyone else observed this and have an explanation?

Yes that is the subject of this post.

https://simonhearne.com/2020/network-faster-than-cache/#the-...

divbzero · on June 29, 2022

The graphs in OP show that cache latency is mostly ~10ms for desktop browsers. ~100ms would still be an outlier.

staticassertion · on June 29, 2022

The Y axis of the chart that I linked, entitled 'Cache Retrieval Time by Count of Cached Assets', shows latency well above 100ms.

divbzero · on June 29, 2022

Thanks. Switching the metric from Average to Max does show that the cache retrieval time can reach ~100ms even when cached resource count is low.

OJFord · on June 29, 2022

Why are you measuring latency in mega Siemens?

dllthomas · on June 29, 2022

... oh, that cache.

femiagbabiaka · on June 29, 2022

I had the same reaction. It's a good title.

philsnow · on June 29, 2022

> Concatenating / bundling your assets is probably still a good practice, even on H/2 connections. Obviously this comes on balance with cache eviction costs and splitting pre- and post-load bundles.

I guess this latter part refers to the trade-off between compiling all your assets into a single file, and then requiring clients to re-download the entire bundle if you change a single CSS color. The other extreme is to not bundle anything (which, I gather from the article, is the standard practice since all major browsers support HTTP/2) but this leads to the described issue.

What about aggressively bundling, but also keeping track at compile time of diffs between historical bundles and the new bundle? Re-connecting clients could grab a manifest that names the newest mega-bundle as well as a map from historical versions to the patches needed to bring them up to date. A lot more work on the server side but maybe it could be a good compromise?

Of course that's the easy version, but it has a huge flaw which is all first-time clients have to download the entire huge mega bundle before the browser can render anything, so to make it workable it would have to compile things into a few bootstrap stages instead of a single mega-bundle.

I am clearly not a frontend dev. If you're going to throw tomatoes please also tell me why ;)

* edit: found the repo that made me think of this idea, https://github.com/msolo/msolo/blob/master/vfl/vfl.py but it's from antiquity and probably predates node and babel/webpack, but the idea is you name individual asset versions with a SHA or tag and let them be cached forever, and to update the web app you just change a reference in the root to make clients download a different resource dependency tree root (and they re-use unchanged ones) *

staticassertion · on June 29, 2022

There's probably some balance here. Since there's a limit of 9 concurrent requests before throttling occurs you can bucket 9 objects, concatenating into each. So if you have a bunch of static content, concat that into 1 bucket. If you have another object that changes a lot, keep that separate. If you have two other objects that change together, bucket those, etc.

Seems like a huge pain to think about tbh. Seems like part of the problem would be solved by compiling everything into a single file that supported streaming execution.

Klathmon · on June 29, 2022

There's a good middle ground of bundling your SPA into chunks of related files (I prefer to name them the SHA hash of the content), and giving them good cache lifetimes.

You can have a "vendor" chunk (or a few) which just holds all 3rd party dependencies, a "core components" chunk which holds components which are likely used on most pages, and then individual chunks for the rest of the app broken down by page or something.

It speeds up compilation, gives better caching, no need for a stateful mapping file outside of the HTML file loaded (which is kind of the point of the <link> tag anyway!), and has lots of knobs to tune if you want to really squeeze out the best load times.

philsnow · on June 29, 2022

Another tool that bears on this idea is courgette [0], in that it operates on one of the intermediate representations of the final bundle in order to achieve better compression.

https://blog.chromium.org/2009/07/smaller-is-faster-and-safe...

jesprenj · on June 30, 2022

https://simonhearne.com/2020/network-faster-than-cache/#by-o...

When I first saw this, I was confused why Linux (and Ubuntu and Debian) performed so poorly. But then I asked myself; is the data even representable? Or is Linux worse because users on Linux tend to stick to lower end hardware, since the measurenents were, AFAIR, taken for granted from website visitors instead of being tested on same hardware with different operating systems installed?

I don't know, can someone explain the discrepancy?

Is the raw data available? What is the number of requests that were included in the measurements for Debian (that doesn't hit cache in the first 5 ms)?

Thanks!

kardos · on June 30, 2022

I wonder if browsers could design a heuristic to cache multiple items in one cache entry -- that is, instead of the website doing file concatenation, the browser dynamically decides to concatenate some resources in order to reduce the number of asks to the local cache. For example, the browser would know on the initial load which requests get made together (such as three js files or four images), so they could get batched into a single cache entry

cwoolfe · on June 29, 2022

So is the takeaway that data in the RAM of some server connected by fast network is sometimes "closer" in retrieval time than that same data on a local SSD?

philsnow · on June 29, 2022

Back in ~2003 I had bought a new motherboard + cpu (a Duron 800MHz IIRC) but as a poor college kid, only had enough money left over for 128MB of RAM.. but the system I was replacing had ~768MB. I made a ~640MB ramdisk on the old system and mounted it on the new system as a network block device, and the result was much, much faster than local swap (this was before consumer SSDs though).

[0] "nbd" / https://en.wikipedia.org/wiki/Network_block_device ; this driver is of course still in the kernel; you could do this today with an anemic raspberry pi if you wanted

adamius · on June 29, 2022

Now I'm imagining a rack of raspis acting as one giant ram swap drive over nbd. This could work for a given value of "work". cost of pi vs stick of ram. A kv storage as well perhaps.

Then again, whats a TB worth on just one xeon server? Probably cheaper... or not?

jhartwig · on June 29, 2022

Have you seen the cost of Pis lately :)

adamius · on June 30, 2022

Didn't say much about availability either :P

staticassertion · on June 29, 2022

Not really. The issue is with throttling.

TheDudeMan · on June 29, 2022

Seems like a badly-designed cache.

charcircuit · on June 29, 2022

When you make the request the server the server will have to look up the image from its own "cache" before having to send it back to you. The client's cache would have to be not only slower than it's ping, but slower than it's ping + the server's "cache."

staticassertion · on June 29, 2022

The issue has nothing to do with the cache itself. It's about the throttling behavior.

gowld · on June 29, 2022

The throttling is part of the cache design. It's browser cache, not a multi-client cache in the OS.

staticassertion · on June 29, 2022

That doesn't sound right based on my reading of this document. If I'm wrong please do correct me, I don't know a ton about the internals here.

https://docs.google.com/document/d/1Aa7OKFRdtmn4IFzgHYfqeqk5...

btdmaster · on June 29, 2022

The cache, for me, is 100:1 winning the races compared to the network. Are there greatly different results for others in about:networking#rcwn?

acd · on June 30, 2022

Sun Microsystems stated: "The network is the computer" https://en.m.wikipedia.org/wiki/The_Network_is_the_Computer

Sun was right.

kreetx · on June 29, 2022

Much confusion in the comments.

Tl;dr: cache is "slow" because the number of ongoing requests-- including to cache! --are throttled by the browser. I.e the cache isn't slow, but reading it is waited upon and the network request might be ahead in the queue.

langsoul-com · on June 30, 2022

Never knew caches could take longer to retrieve. Always thought it'd be near instant.

No good solutions either, weird throttling isn't feasible when creating Web apps.

hgazx · on June 29, 2022

When I had a very old and slow hard disk I ran my browsers without disk cache precisely because of this reason.

dblohm7 · on June 29, 2022

Firefox's HTTP cache races with the network for precisely this reason.

staticassertion · on June 29, 2022

That is what the first sentence of the article is about and it is the premise of the entire article?

dblohm7 · on June 30, 2022

Ah, so it is! I got overexcited because I was one of the people who advocated for a racy cache during the redesign of that code back in the day.

edflsafoiewq · on June 30, 2022

Can I turn this off?

capableweb · on June 29, 2022

Then that implementation doesn't even save bandwidth (for client nor server), which is one of the purposes of having a cache in the first place (together with other reasons).

dblohm7 · on June 30, 2022

When the cache wins the network transfer is canceled, so it’s not as bad as you’re making it out to be.

agumonkey · on June 29, 2022

What's the storage capacity of internet cables ?

LeonenTheDK · on June 29, 2022

Explored here: http://tom7.org/harder/

There's a video and a paper linked on this page with the information on this absurdity.

agumonkey · on June 29, 2022

oh of course, how could I forget this :)

arccy · on June 29, 2022

https://github.com/yarrick/pingfs

moralestapia · on June 29, 2022

On the order of petabytes.

gowld · on June 29, 2022

The article shows a lot of data about cache speed, but I don't see a comparison to cacheless network.

r1ch · on June 29, 2022

For me this is very noticeable whenever I open a new Chrome tab. It takes 3+ seconds for the icons of the recently visited sites to appear, whatever cache is used for the favicons is extremely slow. Thankfully the disk cache for other resources runs at a more normal speed.