Large-scale graph computing at Google

rw · on June 15, 2009

> "In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology [...]"

This sounds very similar to how neural networks are updated through time. Could this be used to easily simulate biological cognition? Am I missing something? And, scalability isn't a problem:

> "Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding."

The human brain only has about 100 billion neurons.

ivankirigin · on June 16, 2009

There are billions of neurons but many more connections between them then are probably typical in the graphs they compute. Even complex road conditions or a set of links from a page probably rarely number more than 1K - the ballpark average number of connection per neuron.

Others are directly working on simulating brains. It's a hard problem.

tricky · on June 16, 2009

I would love to implement a graph like this where each node used bayesian inference to act on messages received from connected vertices.

My gut says if you could get the system's state to settle to an equilibrium, it could react to changes in external signals in a probabilistic way and learn.

Instead I'm busy working on an iphone app...

mattj · on June 16, 2009

Check out en.wikipedia.org/wiki/Junction_tree_algorithm not a great article, but the referenced literature is ok

Don't have much time to elaborate at the moment, but look up the "junction tree algorithm"- it's a way of performing inference in graph-structured statistical models. You think of edges as relationships between random variable (which are the nodes), and have the nodes communicate with each other until all the signals have propogated. Makes inference straightforward, though still exponential

kvh · on June 16, 2009

http://en.wikipedia.org/wiki/Bayesian_network

DocSavage · on June 16, 2009

The number of "iterations" in biological cognition is quite small. You don't get many synaptic transmissions in the time it takes for us to have some act of cognition.

alexgartrell · on June 16, 2009

But how long until it's fast enough?

tricky · on June 16, 2009

It doesn't have to be... at first. Prove it works and everyone will bust ass in a space race to design fast hardware.

ratsbane · on June 15, 2009

Everyone's talking about graph databases lately. Who hasn't designed some relational data structure and then refined it a bit and tweaked it a bit and ended up with some sort of adjacency-list monstrosity and the realization that the relational underpinnings for that are rather awkward?

I'm really looking forward to reading more about Pregel. In the meantime I just found Greg Malewicz's PhD thesis. He seems very interested in scheduling, which must be crucial to Bulk Synchronous Parallel: http://www.cs.ua.edu/~greg/publications/Malewicz_PhD.pdf

Tichy · on June 16, 2009

Representing graphs in relational databases seems straightforward enough (not that I have done, but I have an opinion nevertheless). Accessing it efficiently if you want to traverse a graph seems to be the hard problem. I wonder if there are ANY good solutions at all, short of loading the whole graph into memory. Otherwise I suppose one would need a good heuristic for caching the edges and vertices that are most likely to be accessed?

Maro · on June 15, 2009

I wonder what the similarities between LinkedIn's Neo4J, Google's Pregel and CODASYL is. (CODASYL lost out to RDBMS ~30 years ago.)

http://en.wikipedia.org/wiki/CODASYL

henning · on June 15, 2009

With only a hint as to what this Pregel thing is about, my guess would be that Neo4J and CODASYL are focused on persistence/storage whereas Pregel is meant for high-performance OLAP stuff. Storage issues like replication are probably less of a concern.

Notice that they're submitting the paper to a distributed computing conference and not a database conference.

Ultimately I doubt anyone but the Googles of the world have a need for this kind of technology.

scott_s · on June 15, 2009

There was a time when nobody but the IBMs of the world had use for computers.

TimothyFitz · on June 16, 2009

I work for a company with a 30 million node social graph. That type of data is becoming more and more common. Including things like Twitter where the social graph is effectively open to anyone.