any two optimization algorithms are equivalent when their performance is averaged across all possible problems
Once you accept that, then you start looking at practical considerations.
Having said that, if you do want to do the math then you might like the course from Oxford/Nando DeFreitas (now at DeepMind/Oxford)[2]
[1] https://en.wikipedia.org/wiki/No_free_lunch_theorem
[2] https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPo..., https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni...
any two optimization algorithms are equivalent when their performance is averaged across all possible problems
Once you accept that, then you start looking at practical considerations.
Having said that, if you do want to do the math then you might like the course from Oxford/Nando DeFreitas (now at DeepMind/Oxford)[2]
[1] https://en.wikipedia.org/wiki/No_free_lunch_theorem
[2] https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPo..., https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni...