This is a ridiculous rant. “ oh no! We have choices”. Then you list out every ch...

softwaredoug · on Jan 26, 2024

Author here, well yeah, I agree its probably ridiculous. Sort of testing the waters to see if I'm way off base.

I think what I mean to say is that, in my experience, practitioners and vendors alike are overly focused on "just put embeddings somewhere and do cosine similarity" and that's the only problem to solve. In fact, that's a teeny tiny part of it. Hence "peak vector DB".

So I think the market needs some education that its harder than that. That part is my rant :). I've spoken / worked on enough problems now to see that disconnect between market and reality.

Though I think "vector DB" is actually a place for capital/brainpower to concentrate to solve these other problems. And I think we'll see the vector DB vendors pivot there. It's just taking a while for the market and investors to see this...

brookst · on Jan 26, 2024

It sounds like you’ve conflated “gold rush” with “peak”. All sorts of novel technologies had mad rushes when they’re new, but that does not mean they have peaked. The dot bomb era with its ridiculous overvalued useless startups was a gold rush, but it was in no way peak Internet.

phillipcarter · on Jan 26, 2024

> practitioners and vendors alike are overly focused on "just put embeddings somewhere and do cosine similarity" and that's the only problem to solve

I agree, and as one who does exactly and only this on the search side, it's also something that falls flat on its face if you don't think a little more about the data and tasks involved.

I wrote about it here[0], but the gist of it for our use case is that if we don't intentionally include what may be considered "less relevant" data then we stand a good chance at failing our main generative task.

[0]: https://phillipcarter.dev/2024/01/15/three-properties-of-dat...

enoch2090 · on Jan 26, 2024

Normally having a lot of choices is a good thing, but here we are facing a dozen of vector dbs with very similar features - to the root it's just some version of ANN implemented in C++/Rust/whatever, the "peak" means there's nothing new. People are flooding into this field not because there's something worth inventing, but more of fear to lag behind and miss the quick money. That's what I feel about vector DBs in Jan, 2024.

dimatura · on Jan 26, 2024

Yeah, I'm happy there's a lot of development in this area - even if it's fueled by the LLM frenzy, good nearest neighbor search solutions are useful in a lot of domains. Though I worked a little bit on this problem over 10 years ago (with an application to visual SLAM), and it is a bit amusing to see that a lot of the ideas and even the libraries are still the same!

hooverd · on Jan 26, 2024

We've hit peak peak.