Mm... PageRank is the basis, but then recent publications suggest they use a multitude of factors (probably 7-10+?) to evaluate sites. PageRank is still more-or-less the biggest chunk, but since people learned how to game it, they've had to likewise step up their internal game on ranking. What you're requesting probably comes off as a trade secret, but you could probably get reasonable results using a PageRank inspired hybrid
What are you trying to achieve? Search your website's blog posts? Create an e-commerce search engine? Job search engine? Web search engine? Just learn about how search tech works?
All of these are radically different domains, some of which requiring intensive NLP, others requiring other domains...
Funny thing I was thinking about recently...why not reverse engineer it? Run the top million most common queries, or maybe top billion, snapshot the top 100 results, use that to train your model. With cheap enough compute, can google be reverse engineered?
You'd have to feed your model a copy of the entire internet, and if you can do that, you've already done the hard part of creating a Google clone imo.
In general, if folks want to know how Google works, just do some reading on grey hat / black hat SEO. There is an entire (somewhat) underground industry of people that have ranking in Google down to a science - put exactly this on your page, set up exactly these linking domains with exactly this type of content, satisfying all of these metrics, etc. I honestly think the reason competing search engines are so much worse is just because none of them have tried very hard, or maybe because they just lack funding.
AFAIK, the algorithm is still the core of what it always has been (getting PR links to your page) but Google has just added a bunch of layers on top of that which basically check for things to disqualify you completely or make minor adjustments to your position in the rankings.
If I'm implementing search in an application and want to use NLP, do I need to train the search or are these solutions already ready to go? I'm not sure how other people do it/how search works/if you need to tell it what to do.
Well, of course the engine needs to have access to some corpus to search on. So the general answer to your question is: yes, however this step typically not called "training" but "indexing".
Most engines will repeatedly index your contents with crawlers or similar.