That's one problem with these DiRs. Since the model needs to be loaded using llama.cpp (and llama.cpp uses model path), the DiR either needs to accept model path as model name (type mismatch), or just assume that your models are in a certain path and then receive model file name as model name (better). But that means the DiR needs to control the model loader (and in most cases they actually download and build that automatically after install). But I'd rather build my own llama.cpp and just have a nice interface to it that stays out of my way.
That's true, but not my point. My point is that if the request specifies GPT-3.5, Nitro knows that it cannot possibly serve that model, so anything other than returning an error is simply lying to the client, which is a really bad idea.
Because if the client specifically requests GPT-3.5, but is silently being served something else instead, the client will rely on having GPT-3.5 capabilities without them actually being available, which is a recipe for breakage.