The models are noticeably different — for example, o1 and o3 have reasoning, and some users (eg. me) want to tell the model when to use reasoning, and when not.
As to why they don't automatically detect when reasoning could be appropriate and then switch to o3, I don't know, but I'd assume it's about cost (and for most users the output quality is negligible). 4o can do everything, it's just not great at "logic".
As to why they don't automatically detect when reasoning could be appropriate and then switch to o3, I don't know, but I'd assume it's about cost (and for most users the output quality is negligible). 4o can do everything, it's just not great at "logic".