Ask HN: Can you borrow Googles search algorithm?

quickthrower2 · on Dec 14, 2020

More prosaic but if you have a blog etc. you can add a Google programmable search engine: https://programmablesearchengine.google.com/about/

gkbrk · on Dec 16, 2020

You can also put a DuckDuckGo search box, works well for website search and supports DDG instead of Alphabet.

EDIT: Shameless plug if it gets the author to go with a non-google option https://www.gkbrk.com/wiki/DuckDuckGoSearchBox/

HakeHayashi · on Dec 14, 2020

Mm... PageRank is the basis, but then recent publications suggest they use a multitude of factors (probably 7-10+?) to evaluate sites. PageRank is still more-or-less the biggest chunk, but since people learned how to game it, they've had to likewise step up their internal game on ranking. What you're requesting probably comes off as a trade secret, but you could probably get reasonable results using a PageRank inspired hybrid

ktpsns · on Dec 14, 2020

The pagerank algorithm is public and there exist a lot of implementations, for instance https://networkx.org/documentation/networkx-1.10/reference/g...

tinus_hn · on Dec 14, 2020

I’m pretty sure it was patented at the beginning so it should be free by now

softwaredoug · on Dec 15, 2020

What are you trying to achieve? Search your website's blog posts? Create an e-commerce search engine? Job search engine? Web search engine? Just learn about how search tech works?

All of these are radically different domains, some of which requiring intensive NLP, others requiring other domains...

logicslave · on Dec 14, 2020

Funny thing I was thinking about recently...why not reverse engineer it? Run the top million most common queries, or maybe top billion, snapshot the top 100 results, use that to train your model. With cheap enough compute, can google be reverse engineered?

notshift · on Dec 14, 2020

You'd have to feed your model a copy of the entire internet, and if you can do that, you've already done the hard part of creating a Google clone imo.

In general, if folks want to know how Google works, just do some reading on grey hat / black hat SEO. There is an entire (somewhat) underground industry of people that have ranking in Google down to a science - put exactly this on your page, set up exactly these linking domains with exactly this type of content, satisfying all of these metrics, etc. I honestly think the reason competing search engines are so much worse is just because none of them have tried very hard, or maybe because they just lack funding.

AFAIK, the algorithm is still the core of what it always has been (getting PR links to your page) but Google has just added a bunch of layers on top of that which basically check for things to disqualify you completely or make minor adjustments to your position in the rankings.

throwaway888abc · on Dec 14, 2020

https://developers.google.com/custom-search

brittpart_ · on Dec 14, 2020

Golden, so helpful - thank you!

ktpsns · on Dec 14, 2020

I would recommend to start with a mature search engine such as Lucene or ElasticSearch.

brittpart_ · on Dec 14, 2020

Forgive me if this doesn't make sense:

If I'm implementing search in an application and want to use NLP, do I need to train the search or are these solutions already ready to go? I'm not sure how other people do it/how search works/if you need to tell it what to do.

ktpsns · on Dec 15, 2020

Well, of course the engine needs to have access to some corpus to search on. So the general answer to your question is: yes, however this step typically not called "training" but "indexing".

Most engines will repeatedly index your contents with crawlers or similar.