> does it have direct application for training nets faster? That's a great quest...

> does it have direct application for training nets faster?

That's a great question! Unfortunately not yet -- though we believe further studies may bring us there finally. We found (at least for simple classification tasks) the features seem to have a two-stage behavior: a de-randomization stage to identify the best direction in the feature space; and an amplification stage where features stretches along these directions.

We've been thinking to identify a bound on the exit time of the first stage, and examine how it depends on different hyper-parameters, dataset properties etc, so that one may pinpoint how to reduce the time spending in the first stage, effectively making training faster.

> i.e. can the ode be integrated faster than backprop?

Also a good question, at this stage we need to estimate model parameters (the drift matrix) from simulations on DNNs. As future works we hope to explore if we can pre-determine those parameters so a comparison between backprop might make more sense.