Hacker News new | past | comments | ask | show | jobs | submit | edshiro's comments login

This looks great! And incredibly timely too!

I finished watching this video today where the host and guests were discussing challenges in a RAG pipeline, and certainly chunking documents the right way is still very challenging. Video: https://www.youtube.com/watch?v=Y9qn4XGH1TI&ab_channel=Prole... .

I was already scratching my head on how I was going to tackle this challenge... It seems your library is addressing this problem.

Thanks for the good work.


Looks dope!


I've saved the paper to read it later.

The premise of this work seems very interesting... But I wonder how practical it is from both a cost and time perspective. I am toying around with an AI Agents library and one of the annoying UX things I notice is the time it takes to get my answers, because each call to an agent (either GPT-4 or Claude 3) is kinda slow.

Besides the time, it feels quite wasteful token wise.

I'm skeptical this approach will be adopted by many in the AI Agent space, but of course I could be very wrong.


I don't have much experience with embeddings...

Could someone more knowledgeable suggest when it would make sense to use the SentenceTransformers library vs for instance relying on the OpenAI API to get embeddings for a sentence?


It's fairly easy to use, not that compute intensive (e.g. can run on even a small-ish CPU VM), the embeddings tend to perform well and you can avoid sending your data to a third party. Also, there are models fine tuned for particular domains on HF-hub, that can potentially give better embeddings for content in that domain.


Just to add to this, a great resource is the Massive Text Embedding Benchmark (MTEB) leaderboard which you can use to find good models to evaluate, and there are many open models that outperform i.e. OpenAI's text-embedding-ada-002, currently ranked #46 for retrieval, which you can use with SentenceTransformers.

https://huggingface.co/spaces/mteb/leaderboard


I see - thanks for the clarifications

I presume if your customers are enterprise companies then you may opt to use this library vs sending their data to OpenAI etc.

And you can get more customisation/fine-tuning from this library too.


Embeddings is one of those things that using OpenAI (or any other provider) isn't really necessary. There are many small open source embedding models that perform very well. Plus, you can finetune them on your task. You can also run locally and not worry about all the constraints (latency, rate limits etc) of using an external provider endpoint. If performance is important for you, then you'll need a GPU.

The main reason to use one of those providers is if you want something that performs well out of the box without doing any work and you don't mind paying for it. Those companies like OpenAI, Cohere and others, already did they work to make those models work well on various domains. They may also use larger models that are not as easy to deal with yourself. (although as I mentioned previously, a small embeddings model fine-tuned on your task is likely to perform as well as a much bigger general model)


You should basically never use the openAI embeddings.

There isn't a single usecase where they're better than the free models, and they're slower, needlessly large, and outrageously expensive for what they are.


Up until a month ago, the OpenAI embeddings where very poor. But they recently released a new model which is much better then they're previous one.

Now it depends un specific usecase (domain, language, length of texts)


I get the same feeling. AI Agents sounds very cool but reliability is a huge issue right now.

The fact that you can get vastly different outcomes for similar runs (even while using Claude 3 Opus with tool/function calling) can drive you insane. I read somewhere down in this thread that one way to mitigate these problems is my implementing a robust state machine. I reckon this can help, but I also believe that somehow leveraging memory from previous runs could be useful too. It's not fully clear in my mind how to go about doing this.

I'm still very excited about the space though. It's a great place to be and I love the energy but also measured enthusiasm from everyone who is trying to push the boundaries of what is possible with agents.

I'm currently also tinkering with my own Python AI Agent library to further my understanding of how they work: https://github.com/kenshiro-o/nagato-ai . I don't expect it to become the standard but it's good fun and a great learning opportunity for me :).


This is really exciting to see. I applaud Stability AI's commitment to open source and hope they can operate for as long as possible.

There was one thing I was curious about... I skimmed through the executive summary of the paper but couldn't find it. Does Stable Diffusion 3 still use CLIP from Open AI for tokenization and text embeddings? I would naively assume that they would try to improve on this part of the model's architecture to improve adherence to text and image prompts.


They use three text encoders to encode the caption:

1. CLIP-G/14 (OpenCLIP)

2. CLIP-L/14 (OpenAI)

3. T5-v1.1-XXL (Google)

They randomly disable encoders during training, so that when generating images SD3 can use any subset of the 3 encoders. They find that using T5 XXL is important only when generating images from prompts with "either highly detailed descriptions of a scene or larger amounts of written text".


One of the diagrams says they're using CLIP-G/14 and CLIP-L/14, which are the names of two OpenCLIP models - meaning they're not using OpenAI's CLIP.


I have just been informed that my above comment is false, the CLIP-L is in fact referring to OpenAI's, despite that also being the name of an OpenCLIP model.


Nice work! I like the quick rendering too.

However, I don't believe any AI was used for this, so the actual TLD is a bit inappropriate here. A ".io" TLD would would have been more fitting.

Not every slice of bread needs to be spread with butter. Similarly, not every project requires AI.


Maybe it's based in Anguilla :-)


This is amazing. Definitely something I will look at to relax when I feel stressed with this startup life.

Would also love to see a technical write up about how you implemented this. Great creative work!


User sayovard put the link above. Thought of letting you know. Good luck with your startup work.

https://www.html5rocks.com/tutorials/casestudies/100000stars...


I presume this does not apply to computer vision datasets? Frankly I am still confused at what exactly Snorkel does.


you have a dataset of images and you write code (labeling functions LF) to label the images. Snorkel handles the pipeline but more importantly corrects the conflicts/correlations between the LFs. The output is a supervised dataset w/ mutually exclusive labels a la softmax classification.

the labels are noisy, but you have a quantity that you could not get by humans, AND at a faster/cheaper rate. they provide analysis arguing that, for discriminative models, quantity CAN outweigh quality.

to your point it's not typically used w/ the image-only modality. It's mostly used where there is some meta-data attached.


This looks great! I had never heard of Dash either and it seems it is precisely the tool we need at our startup to do data visualisation.

Awesome work. Will check all this out during the week-end.


Thank you for sharing this very candid article on the Starsky Robotics and generally the autonomous vehicle space. It's a real eye opener. I've been following your progress for the last few years (and I also read about your company through Reilly Brennan's "Trucks - FOT" newsletter).

I am sorry you could not get investors to believe more in what you and your team, especially as you required a lot less funds than many other companies in this arena. I also thought you had a clear business case (I worked in ride-hailing and also logistics so understand some of the problems in this space).

I wanted to ask you a question: I am building a startup in the dash cam video analysis space. We are building a large and geographically diverse dataset of road videos, where our users can annotate/label the data. We then are going to look at detecting specific events like accidents and edge cases on videos. Do you feel this type of business, the data we collect, and insights we generate would have value for a AV startup?

All the best in your next move. Stay strong - you can be proud of what you and your team did.


Hard to say.

We had really strict rules on what we wanted our data to look like, and were very specific about subject matter. We probably wouldn't have been able to be a customer.

Our intention to deal with accidents/edge cases was that if anything looked outside of ODD (as in, not perfect driving conditions) we'd execute a MRC (pull to side of the road, or stop at next exit). Relatively easy way to solve most of the really hard edge cases.


Understood and makes total sense given the ODD at Starsky.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: