It's nice to get people thinking about possible connections and sharing terminol...

It's nice to get people thinking about possible connections and sharing terminology here, but I'm not sure that many of the connections which the article manages to make precise are particularly new or deep.

A neural network is just a mathematical function[1] of inputs and parameters, and of course you can build them up using higher-order functions and recursion. Libraries like theano already let you build up a function graph in this way -- see theano.scan[2], for example, which is something like a fancy fixed-point combinator.

The idea about types corresponding to representations seems like it would be hard to make precise, because at a type level everything's pretty much just R^n or perhaps some simple submanifold of R^n. "Has representation X" isn't a crisp logical concept, and in most type theories I know of types are crisp logical properties.

Even if you can accomodate fuzzy/statistical types somehow, I would tend to think of a representation as being the function that does the representing of a particular domain, rather than the range or the distribution of the outputs of that function. Two very different representations might have a similar distribution of outputs but not represent compatible concepts at all.

Still there is a neat observation here that by combining two different representations downstream of the cost function you're optimising (e.g. by adding their outputs together and using that as input to another layer) you can force them in some sense to conform to the same representation, or at least to be compatible with eachother in some sense. You could probably formalise this in a way that lets you talk about equivalence classes of representation-compatible intermediate values in a network as types.

It wouldn't really typecheck stuff for you in a useful way -- if you accidentally add together two things of different types, by definition they become the same type! But I guess it would do type inference. For most networks I'm not sure if this would tell you anything you didn't already know though.

If you want to find a useful connection between deep learning and typed FP, you could start by thinking about what problem you'd want your type system to solve in the context of deep learning. What could it usefully prove, check or infer about your network architecture?

[1] I was going to say "a differentiable function", but actually they're often not these days what with ReLUs, max-pooling etc. The non-differentiable points are swept under the rug in that I don't see anyone bothering with subgradient-based optimisation methods.

[2] http://deeplearning.net/software/theano/library/scan.html