A fascinating memoir by a philosopher turned brain surgeon, facing a terminal cancer diagnosis. A person who spent their entire life pondering the morality of life being faced with their own ultimatum.
I reread it once a year, at minimum. A deeply moving book.
If you look for just pure vector similarity search, there are many alternatives. But Vespa's tensor support, multi-vector indexing and the ability to express models like colBERT (1) or cross-encoders makes it stand out if you need to move beyond pure vector search support.
Plus, for RAG use cases, it's a full blown text search engine as well, allowing hybrid ranking combinations. Also with many pure vector databases like Pinecone, you cannot describe an object with more than one vector, if you have different vector models for the object, you need different indexes, and then duplicate metadata across those indexes (if you need filtering + vector search).
Yeah, my assumption was that something in some layer of their application isn’t well optimized when asked to return posts from a subreddit that has “gone dark” in whatever fashions the mods chose to do that.
For example, maybe it causes reads from the database take a lot longer than they normally would, locking up the database or causing the process the crash (again, that’s just pure speculation).
one I've been wondering about is user overview pages. People use those a lot (it's actually my bookmark for getting onto reddit) and yesterday I noticed that a post I made wasn't in my overview, and it's because that sub had gone dark early.
What happens when a user has 99% of their posting in subs that are now hidden, and the API is programmed to produce a fixed 30 comments of history on the overview page? The answer is extremely deep database pulls... you might pull a year of comment history to get 30 comments that aren't hidden. And depending on how they do that, it may actually pull the whole comment history for that timespan, since most of the time posts aren't hidden like this.
I worked at a backend team at work with some very overburdened legacy tables in mongo, and this is the kind of thing we'd think about. Yeah you can use an index, but then you have to maintain the index for every record, and change it every time a sub goes private/public (and we literally were hitting practical limits on how many indexes we could keep, we finally instituted a 1-in-1-out rule). And how often does that happen? Even deleted comments are overall probably a minority such that indexes don't matter, but, this is relational data, you have to know which subreddits are closed before you can filter their results, and mongo sucks at joins. And the mongo instance can become a hotspot, so, just filter it in the application instead for those "rare" instances. Even if they are doing it in mongo, the index/collection they're joining may suddenly be 100x the size, which could blow stuff up anyway.
edit: for me, one overview page is now taking me back one month in comment history. And I comment a lot on subs that are currently closed, so it could easily be throwing away 5-10 comments for every comment it displays.
I'm guessing hit on the open subreddit mostly goes directly out of caching layer while hit on private one incurs DB hit to check whether user belongs there
As somebody that has done both contracting and worked in tech, I hope this comment makes sense.
Most homeowners (product managers) want the cheapest (fastest) thing that fixes (delivers) a working home (feature) to them. When given the options those are generally what people choose.
Sure some contractors (software engineers) always cut corners, but I believe many take pride in their work and given the option would prefer a solution they take pride in.
I trust you have the numbers for this. And we can debate them until the world ends, but I think having a mental fallback that “yeah, some people suck” is valuable for your mental well being.
Life’s too short to let the few negative people out there bring you down. 1% or 5% whatever lets you realize that and move on
> I find it more efficient to work in topic-scoped batches, so I can load context on a protocol and codebase once and use it to land multiple changes.
This is my favorite way of writing software as well. My current gig has a ton of microservices, and when a feature comes up that requires changing one of them, I much prefer to make a couple other, smaller changes that help keep the service operational and easier to maintain with it.
One issue is that this often times bring out the yak-shaving, but I think it's a fair tradeoff and helps reduce the time burden of doing large migrations.
If yak-shaving in this case refers to polishing, I don't believe it's a bad thing per se for a crypto library - or any standard library, for that matter.
https://www.databass.dev/