Hacker News new | past | comments | ask | show | jobs | submit login

keywords? These are embedding models. Clip puts those phrases into an embedding that encompasses a location in the space you want to avoid. No need for the "keywords" to be in the image dataset.



CLIP isn’t magic. “bad anatomy” won’t work any more than “picture that isn’t a cat” does.

Try it on clip-front: https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2...


So the problem with that, is you're visualising the space with only points that exist in the image dataset. The language embedding has more information that comes from the language that isn't contained in images.

It handles bad, and it handles anatomy. If there aren't single images that cover that - that's exactly what language embeddings solve for.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: