Hacker News new | past | comments | ask | show | jobs | submit login

Efficiency is the expected answer. I'm just wondering if there's a more theoretical reason, such as "every function that can be computed by a non-layered acyclic network can be computed by a complete layered network using only a small number of extra nodes/layers."



I think that it can. With some weights of 0 and some weights of 1, you can trivially map 'jumps' that skip from a node in one layer to a node a couple layers distant, by means of some incorporate-no-other-inputs intermediate nodes, right? Sigmoid function on 1 is still 1? Once you have those, it's just a matter of how many layers you need for any acyclic structure, I think.

Although if you wanted to come up with difficult scenarios, it's not hard to think of structures that would make some of those middle layers really tall, or add a lot of middle layers.


As I mentioned in another branch of this thread, selectively choosing edges between nodes isn't an option, because in the standard model you have complete incidence between nodes in adjacent layers.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: