That's one problem with these DiRs. Since the model needs to be loaded using lla...

p-e-w · on Jan 6, 2024

That's true, but not my point. My point is that if the request specifies GPT-3.5, Nitro knows that it cannot possibly serve that model, so anything other than returning an error is simply lying to the client, which is a really bad idea.

trifurcate · on Jan 6, 2024

> which is a really bad idea.

Why?

p-e-w · on Jan 6, 2024

Because if the client specifically requests GPT-3.5, but is silently being served something else instead, the client will rely on having GPT-3.5 capabilities without them actually being available, which is a recipe for breakage.

brigadier132 · on Jan 6, 2024

You do understand that the client will be written by the same people setting up the inference server?

alpark3 · on Jan 6, 2024

Because it's lying to the client?

trifurcate · on Jan 7, 2024

And why is that bad?

Your mindset would mean that Windows would have next to no backwards compatibility, for instance.