Another variation on this is to think about tokens and definitions. Numbers don’...

Another variation on this is to think about tokens and definitions. Numbers don’t have inherent meaning for your use case, so if you use numbers you need to provide an explicit definition of each rating number in the prompt. Similarly, and more effectively is to use labels such as low-quality, medium-quality, high-quality, and again providing an explicit definition of the label; one step further is to use explicit self describing label (along with detailed definition) such as “trivial-observation-on-naming-convention” or “insightful-identification-on-missed-corner-case”.

Effectively you are turning a somewhat arbitrary numeric “rating” task , into a multi label classification problem with well defined labels.

The natural evolution is to then train a BERT based classifier or similar on the set of labels and comments, which will get you a model judge that is super fast and can achieve good accuracy.