Hacker News new | past | comments | ask | show | jobs | submit login

We do something internally[0] but specifically for security concerns.

We’ve found that having the LLM provide a “severity” level (simply low, medium, high), we’re able to filter out all the nitpicky feedback.

It’s important to note that this severity level should be specified at the end of the LLM’s response, not the beginning or middle.

There’s still an issue of context, where the LLM will provide a false positive due to unseen aspects of the larger system (e.g. make sure to sanitize X input).

We haven’t found the bot to be overbearing, but mostly because we auto-delete past comments when changes are pushed.

[0] https://magicloops.dev/loop/3f3781f3-f987-4672-8500-bacbeefc...




The severity needing to be at the end was an important insight. It made the results much better but not quite good enough.

We had it output a json with fields {comment: string, severity: string} in that order.


Another variation on this is to think about tokens and definitions. Numbers don’t have inherent meaning for your use case, so if you use numbers you need to provide an explicit definition of each rating number in the prompt. Similarly, and more effectively is to use labels such as low-quality, medium-quality, high-quality, and again providing an explicit definition of the label; one step further is to use explicit self describing label (along with detailed definition) such as “trivial-observation-on-naming-convention” or “insightful-identification-on-missed-corner-case”.

Effectively you are turning a somewhat arbitrary numeric “rating” task , into a multi label classification problem with well defined labels.

The natural evolution is to then train a BERT based classifier or similar on the set of labels and comments, which will get you a model judge that is super fast and can achieve good accuracy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: