> "In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology [...]"
This sounds very similar to how neural networks are updated through time. Could this be used to easily simulate biological cognition? Am I missing something? And, scalability isn't a problem:
> "Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding."
The human brain only has about 100 billion neurons.
There are billions of neurons but many more connections between them then are probably typical in the graphs they compute. Even complex road conditions or a set of links from a page probably rarely number more than 1K - the ballpark average number of connection per neuron.
Others are directly working on simulating brains. It's a hard problem.
I would love to implement a graph like this where each node used bayesian inference to act on messages received from connected vertices.
My gut says if you could get the system's state to settle to an equilibrium, it could react to changes in external signals in a probabilistic way and learn.
Check out en.wikipedia.org/wiki/Junction_tree_algorithm not a great article, but the referenced literature is ok
Don't have much time to elaborate at the moment, but look up the "junction tree algorithm"- it's a way of performing inference in graph-structured statistical models. You think of edges as relationships between random variable (which are the nodes), and have the nodes communicate with each other until all the signals have propogated. Makes inference straightforward, though still exponential
The number of "iterations" in biological cognition is quite small. You don't get many synaptic transmissions in the time it takes for us to have some act of cognition.
Everyone's talking about graph databases lately. Who hasn't designed some relational data structure and then refined it a bit and tweaked it a bit and ended up with some sort of adjacency-list monstrosity and the realization that the relational underpinnings for that are rather awkward?
I'm really looking forward to reading more about Pregel. In the meantime I just found Greg Malewicz's PhD thesis. He seems very interested in scheduling, which must be crucial to Bulk Synchronous Parallel: http://www.cs.ua.edu/~greg/publications/Malewicz_PhD.pdf
Representing graphs in relational databases seems straightforward enough (not that I have done, but I have an opinion nevertheless). Accessing it efficiently if you want to traverse a graph seems to be the hard problem. I wonder if there are ANY good solutions at all, short of loading the whole graph into memory. Otherwise I suppose one would need a good heuristic for caching the edges and vertices that are most likely to be accessed?
With only a hint as to what this Pregel thing is about, my guess would be that Neo4J and CODASYL are focused on persistence/storage whereas Pregel is meant for high-performance OLAP stuff. Storage issues like replication are probably less of a concern.
Notice that they're submitting the paper to a distributed computing conference and not a database conference.
Ultimately I doubt anyone but the Googles of the world have a need for this kind of technology.
I work for a company with a 30 million node social graph. That type of data is becoming more and more common. Including things like Twitter where the social graph is effectively open to anyone.
This sounds very similar to how neural networks are updated through time. Could this be used to easily simulate biological cognition? Am I missing something? And, scalability isn't a problem:
> "Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding."
The human brain only has about 100 billion neurons.