Hacker News new | past | comments | ask | show | jobs | submit login

This reads solely as a sales pitch, which quickly cuts to the "we're selling this product so you don't have to think about it."

...when you actually do want to think about it (in 2024).

Right now, we're collectively still figuring out:

  1. Best chunking strategies for documents
  2. Best ways to add context around chunks of documents
  3. How to mix and match similarity search with hybrid search
  4. Best way to version and update your embeddings



(post co-author here)

We agree a lot of stuff still needs to be figured out. Which is why we made vectorizer very configurable. You can configure chunking strategies, formatting (which is a way to add context back into chunks). You can mix semantic and lexical search on the results. That handles your 1,2,3. Versioning can mean a different version of the data (in which case the versioning info lives with the source data) OR a different embedding config, which we also support[1].

Admittedly, right now we have predefined chunking strategies. But we plan to add custom-code options very soon.

Our broader point is that the things you highlight above are the right things to worry about, not the data workflow ops and babysitting your lambda jobs. That's what we want to handle for you.

[1]: https://www.timescale.com/blog/which-rag-chunking-and-format...


Points 2-4 are clear pointers to a real database as the home for vector data & search.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: