It's someone taking llama.cpp, strapping it to a HTTP server lib and implementing the OpenAI REST API, then putting some effort into a shiny website and docs, because AI-anything is a reputation and/or investment magnet right now.
Not that it isn't a nice idea -- making it easy to test existing OpenAI-based apps against competing open models is a pretty good thing.
so I still need to download and host models myself.
I found that to be incredibly impractical. I tried to do it for my project AIMD, but the cost and quality just made absolutely no sense even with the top models.
Well, the market for local inference is already quite large, to say the least. “It didn’t pencil out in my business favor” doesn’t seem like a fair criticism, especially for an app clearly focused on the hobbyist—>SMB market where compute costs are dwarfed by the costs of wages and increased mental load.
I definitely see your specific point tho, and have found the same for high-level usecases. Local models become really useful when you need smaller models for ensemble systems, to give one class of use case you might want to try out —- e.g. proofreading, simple summarization, tone detection, etc.