Layered is much more specific than acyclic. I can come up with an example that is not layered by still acyclic. Just connect nodes across two or more layers.
Functionally, layered and acyclic NNs are the same thing. An arbitrary acyclic NN acts the same as a layered NN with some of the weights fixed at 1 or 0. (Replace any edge crossing n layers in a non-layered acyclic NN with a chain of n nodes to get a layered NN that responds to input the same as the original non-layered NN.)
I suppose there may be some cases where the extra speed you get by omitting the intermediate nodes pays off. However, I can't imagine a situation where you'd know enough about the problem in advance that you could design the NN's graph in that level of detail.
Usually the adjacency between layers is complete, so your modification isn't quite without loss of generality.
I also cannot imagine a situation in which you'd know that much detail, but using a general network would allow one to, for example, dynamically modify the topology of the graph (as real neural networks do regularly).
EDIT: I guess what I'm asking for is a rigorous proof that the two models are equivalent with as little overhead as you say there should be.