Scientist Finds PageRank-Type Algorithm from the 1940s

frisco · on Aug 22, 2013

What a silly article. PageRank is based on eigenvector centrality[1], which is one of a number of well-understood centrality measures that have been around for a long time. Google's innovation was noticing the importance of the link structure of the internet and knowing to apply EC to it - not inventing the graph theory underlying EC.

Edit: and knowing to apply EC to that graph structure.

[1] http://en.wikipedia.org/wiki/Centrality

nullc · on Aug 22, 2013

What they patented, however, is more broad than that still:

1. A computer implemented method of scoring a plurality of linked documents, comprising: obtaining a plurality of documents, at least some of the documents being linked documents, at least some of the documents being linking documents, and at least some of the documents being both linked documents and linking documents, each of the linked documents being pointed to by a link in one or more of the linking documents; assigning a score to each of the linked documents based on scores of the one or more linking documents and processing the linked documents according to their scores.

6ren · on Aug 22, 2013

Patent claims usually begin with the broadest claim possible, then narrow it in following claims. If the first is found invalid, the other claims may still succeed. Thus, it makes no sense for the first claim to be certain to stand on its own; strategically, it should be over-broad.

From a computer science/mathematics point of view, this is also a convenient way to factor out commonalities of several claims, instead of duplicating them ad nauseam. For example, in the PageRank patent, claim 1 is immediately followed by six dependent claims giving further detail on it (as do ten other claims scattered through the rest): https://www.google.com/patents/US6285999

Thus, it is inaccurate to treat the first claim as "what they patented".

throwawaykf02 · on Aug 22, 2013

While it's true that the independent claims try to be as broad as possible and dependent claims narrow them down, each claim is still evaluated -- and must be valid -- on an individual basis. And if infringement of a patent is to be determined, you'd start with the independent claims and work your way down, as those have the broadest scope.

So, yes, it does make sense to say that the patent claims what nullc quoted. It's only slightly inaccurate in that the patent also claims a lot more, but the independent claims define the broadest scope.

6ren · on Aug 23, 2013

Oh dear, you and nullc are right, and I am wrong. My apologies. This patent was granted, and so they do have a patent for claim 1 (and also for the other claims). I was confusing it with a patent application.

It's surprising to me that even the cursory prior art search of examination didn't turn up this idea, since it's so simple, doesn't use any of the meat of the invention described, and the idea of ranking by citation was well-known, e.g. for academic papers. I think applying that to hypertext links is clever and insightful... but not patent-worthy (like many other patents). Upon litigation, I think claim 1 might not stand. But regardless, the patent was granted, so they do have a patent on claim 1.

Thanks for taking the time to correct me.

pudquick · on Aug 22, 2013

.. And yet still not invalidated by this prior art (if you were to stretch the concept) due to key phrases like 'computer implemented method'.

Computers in the 1940s were things like ENIAC and more likely to be involved in computing firing tables or cryptanalysis.

Now - if you were to look at Kleinberg's HITS instead you may possibly have a better argument, but I'm betting Kleinberg's prioritization method is still technically different enough than PageRank for that patent to stand. Given that Brin and Page actually reference the paper, I'm sure they've had patent lawyers make sure it wasn't infringing.

mayank · on Aug 22, 2013

> Google's innovation was noticing the importance of the link structure of the internet - not inventing the graph theory underlying EC.

Actually, I'd say that Google's innovation was applying the eigenvector centrality computation to the Web graph, not discovering any particularly meaningful property (the "random surfer") of the Web graph.

I'd call it an extremely skillful application of an old algorithm to a new problem domain.

spullara · on Aug 22, 2013

We shouldn't allow patents on old X applied to new Y domain.

gfodor · on Aug 22, 2013

You just gave a fairly abstract definition of "innovation."

walid · on Aug 22, 2013

Nope, @spullara just gave a definition of prior art :|

gfodor · on Aug 23, 2013

The point being the difference between "innovation" and "prior art" is a tricky one, since all innovation builds upon previous knowledge of some kind.

wicknicks · on Aug 22, 2013

Exactly. Page and Brin cite some papers (or atleast Kleinberg's paper does -- which inspired Pagerank) which deal with ranking authors based on their citation graph. This article makes no sense whatsoever.

floobynewb · on Aug 22, 2013

Damn you beat me to it.

Houshalter · on Aug 22, 2013

I'm really curious how on Earth they tracked all these down. Imagine how many obscure papers were published they didn't look through that had the same idea.

>One important question is: what is the value of each sector when they are so tightly integrated? Leontief’s answer was to develop an iterative method of valuing each sector based on the importance of the sectors that supply it.

That's actually pretty interesting. But I think in economics a circular graph doesn't really make sense, does it? It's more two ways with one person trading with another person (with any number of intermediate nodes.)

chalst · on Aug 22, 2013

> But I think in economics a circular graph doesn't really make sense, does it?

Leontief functions are a list of what inputs and outputs each technology a country uses as inputs and yields as outputs. So farming a region might require 100 tractors (inputs), of which 95 will be in good condition at the year's end (outputs), and need 50 000 hours of labourer time (input) and yield 10 000 tons of corn (output).

Then a network of trade will usually give you circular graphs: Country A produces corn that is imported by country B to feed its workers. Country B produces iron that is imported by country C as an industrial raw material. Country C produces tractors that are imported by country A for farming.

I think Leontief functions are one of the cooler ideas in economics: they get very close to the empirical data and they allow you to leverage linear algebra effectively. They've been important in some fundamental debates in economics, particularly the Cambridge capital controversy.

jimhefferon · on Aug 22, 2013

> how on Earth they tracked all these down

I don't know that work done by a famous, and Nobel prize-winning, author really qualifies as "tracking down."

This is a superficial article and I have not read the underlying paper, but in the article it is unclear how wide was the literature search.

Houshalter · on Aug 23, 2013

Yes, it was a famous example, which made it easier to find. My point is think of all the obscure papers published that also had that idea.

amerika_blog · on Aug 22, 2013

Ah, the innovation of our age... isn't innovation after all.

blauwbilgorgel · on Aug 22, 2013

Surely this is innovation when applied to a new domain (ranking websites). Also Google never stopped innovating PageRank and other ranking techniques.

PG applying Bayes' theorem to e-mail spam was innovative.

If we get very loose with our definition of algorithm, one could even say that ants implemented the first PageRank-like algorithm, when they ranked sources of food, using pheromone trails.

walid · on Aug 22, 2013

Well I haven't read the actual page ranking patent but apply existing math to solve a problem is not innovation.

However your example about ants is very true. Ants should be the patent holders of page ranking. But that also means the patent should have already expired a long time ago.