TensorFlow + TensorFlow Serving + Google ReCeption model plus optionally a SVN on ReCeption features for your custom detection. All that code and the pretrained model is Open Source. There's some engineering to glue it together and some extra work for the easier, non-image classification parts.
+this. I've implemented a subset of this kind of pipeline before on AWS (image tagging + face identification) using the building blocks that existed last year (it was AlexNet at the time, with a pre-release version of MXNet, because Google hadn't released the trained inception model). Implementing this basic functionality at a basic working level, given the tools Google has released, isn't impossible.
Now, making it production-quality, efficient, scalable, and the rest -- well, y'know. That's why people use cloud-based services in the first place.
But I think there's less fundamental lock-in than you think. Cloudinary, for example, will let you upload an image and get a tag out. ABBYY and OmniPage/Nuance and others offer cloud-based OCR.
I'm biased - I'm at Google this year - so take this with a grain of salt, but while I have the feeling that Google can do it better and more affordably than a small team could do it on their own, I don't think that Google pulling the API would leave people up a creek without a paddle.
Google's face/landmark/label/text/logo detection models are open source? Or there exist open source pretrained models?
The quality and size of the training set is (at least) as important as the machine learning tools. I imagine Google has access to a pretty big data set, along with the computing resources to process it.
There is also http://www.deepdetect.com