Re: the relay, that depends on your needs. My impression from these sorts of "ru...

half-kh-hacker · 2025-04-27T03:36:39 1745724999

- the relay storage volume scales only with the backfill window for consumers that drop briefly - the bluesky pbc operated relays let you reconnect up to 24h later and not miss any events but that requires around 200gb of scratch disk space -- live tailing an rpi relay without dropping a connection can give you events from the full network span (ie the complete set of the firehose) without requiring any backfill window, but it's nice to use a few tens of gigabytes anyway. -- the full firehose is like 20mbps at maximum so it's far from hard to serve a few live consumers

- bluesky's feed gen post-dropping is about internal operation of their appview and not anything to do with network sync semantics

- if you're running an AppView for the bsky data you are likely keeping a copy of all bsky posts in a database, since fetching from PDSes on-the-fly is network intensive over a relatively small pipe, which is what i mean by write volume requirements.

yellowapple · 2025-04-28T00:30:11 1745800211

> the relay storage volume scales only with the backfill window for consumers that drop briefly […] bluesky's feed gen post-dropping is about internal operation of their appview and not anything to do with network sync semantics

Gotcha; thanks for the clarifications/corrections. Good to know that the firehose bandwidth is a lot less than I thought (though 20Mbps can certainly add up to some hefty pricetags depending on how you're billed for traffic).

> if you're running an AppView for the bsky data you are likely keeping a copy of all bsky posts in a database, since fetching from PDSes on-the-fly is network intensive over a relatively small pipe, which is what i mean by write volume requirements.

Right, but how much of that actually needs to hit the disk? I'd imagine most appviews can readily get away with just keeping posts in RAM, and even if disk storage is desired (e.g. to avoid needing to pull everything from the PDSes if an appview server reboots), it ain't like the writes need to be synchronous or low-latency. A full-blown ACID-compliant DBMS is probably overkill.

It'd also be overkill to cache all posts, rather than subsets (e.g. each users' "Discover" and "Following" feeds), so I reckon that'd also reduce the in-appview caching needs further.

half-kh-hacker · 2025-04-28T11:08:58 1745838538

the reference bluesky backend does just keep everything around but this idea has merit!! you're actually reinventing something like AppViewLite right now, which does throw away old data: https://github.com/alnkesq/AppViewLite

bluesky chooses to not refetch data from PDSes all the time so that the load for a PDS stays low (they like it to be possible to run on a home connection)