Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Surprised to read LinkedIn's "People You May Know" MapReduce pipeline takes 82 jobs.

From what I've read Google's index runs in around 20 MapReduce jobs.



Pagerank is a like a markov chain - you iterate the same dataset over and over until you're happy with the result. If 20 is good enough, its good enough. Good explanation here: http://www.iterativemapreduce.org/samples.html#Pagerank

Whereas if you include signals from multiple sources, the joins are each one MR job, never mind the calculations.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: