Back in 2006, in highschool, I was investigating multilayer feed-forward NNs. I found them magical. I wrote the XOR problem etc. etc.
What always confounded me was the choice of the number and width of hidden layers. This is even now more confusing with the advent of deep and recursive networks. We need empirical work on this, that can be taught in much the same way that gravity is taught as an apple falling from a tree.
We need a determination of the entropy of a network, how to route that entropy and expolit it. Specific scenarios are not adequate.
What always confounded me was the choice of the number and width of hidden layers. This is even now more confusing with the advent of deep and recursive networks. We need empirical work on this, that can be taught in much the same way that gravity is taught as an apple falling from a tree.
We need a determination of the entropy of a network, how to route that entropy and expolit it. Specific scenarios are not adequate.