err, no. The model is a "black box" if the only thing we have is the input and t...

err, no. The model is a "black box" if the only thing we have is the input and the output, and only little intuition how the model produces output from the input. We have spent at least a couple of thousand years studying geometry; we know geometry quite well.

Let me demonstrate with a stupidly simple geometric model.

Suppose (for the sake of the argument) that we have simple image input, consisting only simple solid geometrical structures. Say, solid 2-d circles of one color, on the background of different color.

From high school geometry, we know that we know everything there's to know about a circle when we know its location on x-y plane and its radius. We could easily come with a parametric model for fitting circles in the pixel image data of circular objects. (For example, we could minimize difference in 2-norm between data image and image corresponding to a set of circles, [x_i,y_i,r_i], i=1..n ). This kind of descriptive parametric model would be particularly easy to understand: model structure consists nothing but representations of circles! (But of course, it wouldn't be particularly interesting model; it would apply to simple images consisting of circles only).

Alternatively we could work out the mathematics bit more, and come up with something like Hough transform to find circular shapes. Still nothing mysterious about it: https://en.wikipedia.org/wiki/Circle_Hough_Transform

However, my point is: We could also train a neural network to find circles in the images of our example. It might be good at it. However, understanding how the circle representations are encoded in the final trained network certainly would not be as easy than in our nice parametric model.

Some realistic applications of "simple" geometric models would be active contours / snakes ( https://en.wikipedia.org/wiki/Snake_(computer_vision) ) or (stretching the meaning of the word 'geometry') the various traditional edge detection algorithms that have been around long time.

Or read the post, in which the author describes how they utilized projection geometry model to account for camera positions and orientation, or for stereographic images. We know how the geometry of stereographic vision works: we don't need waste resources to train a network to learn inscrutable model for it.

Deep learning is useful when we need models for things complicated enough that we don't know how to model them. (For example, model that tells us "is there a dog in this image".)