Hybrid computing using a neural network with dynamic external memory

the_decider · on Oct 12, 2016

Some interesting ideas sadly blocked behind a pay-wall journal, all for the purpose of boosting a researcher's prestige because they now hold a "Nature" publication. Thankfully, this article is easily accessible via Sci-Hub. http://www.nature.com.sci-hub.cc/nature/journal/vaop/ncurren...

superfx · on Oct 12, 2016

Here's an official, publicly accessible, link to the article: http://rdcu.be/kXhV

jedharris · on Oct 12, 2016

Not downloadable though. Provided as a distraction from the paywall.

daveloyall · on Oct 13, 2016

Did you paste the correct link? When I follow that one, I end up on a page with a few sentences of Cyrillic characters. I picked a button that was probably download and I landed on a captcha that I wasn't able to pass after three tries.

nl · on Oct 13, 2016

This is probably the most important research direction in modern neural network research.

Neural networks are great at pattern recognition. Things like LSTMs allow pattern recognition through time, so they can develop "memories". This is useful in things like understanding text (the meaning of one word often depends on the previous few words).

But how can a neural network know "facts"?

Humans have things like books, or the ability to ask others for things they don't know. How would we build something analogous to that for neural network-powered "AIs"?

There's been a strand of research mostly coming out of Jason Weston's Memory Networks research[1]. This extends on that by using a new form of memory, and shows how it can perform at some pretty difficult tasks. These included graph tasks like London underground traversal.

One good quote showing how well it works:

In this case, the best LSTM network we found in an extensive hyper-parameter search failed to complete the first level of its training curriculum of even the easiest task (traversal), reaching an average of only 37% accuracy after almost two million training examples; DNCs reached an average of 98.8% accuracy on the final lesson of the same curriculum after around one million training examples.

[1] https://arxiv.org/pdf/1410.3916v11.pdf

gallerdude · on Oct 13, 2016

Thank you, I think I understand this now. So now we can train a model that doesn't have to learn everything from its weights alone.

Would this be an apt metaphor: LSTM's were like a student who had to know how to take a test and memorize how to do the problems - a DNC can learn how to take the test but it can look at its notes.

petra · on Oct 13, 2016

If it sucseeds and scales, it seems very close to AGI, right ?

nl · on Oct 13, 2016

No-where near it. So far away that it is almost completely nonsensical to talk about it.

I guess it is unlikely that one could have an AGI without some kind of memory, so there is that.

petra · on Oct 13, 2016

What further key skills will AGI need ?

visarga · on Oct 13, 2016

In general an AGI would be based on a reinforcement learning framework. Its main skill would be to observe the world, judge the situation and perform actions. These three processes are run in a continuous loop. It would receive a reward signal by which it would learn behavior. It would have to be embedded in a world where it can move about and act upon. If it has all these ingredients, it can become a general intelligence, as long as the reward signal is leading it to do that.

Memorizing is just one of the actions such an agent is able to perform. Another mental action besides memory would be attention. It would also need to be able to simulate the world, people and systems it is interacting with (to know how they behave) in order to be able to do reasoning and planning.

In short, an AGI would need: sensing (deep neural nets for vision, audio and other modalities), attention, memory, estimating the desirability and effects of various actions (a kind of imagination), an extensive database of common known facts, and the ability to act (for example by speech and movement).

Many of these systems have been demonstrated. Sensing, attention and memory are common place in ML papers. Creativity is demonstrated in generative models that can write text, music and paint. Ability to predict the future and reason about it was demonstrated in AlphaGo. Speech and motor control are under development. We have most of the necessary blocks, but nobody has put them together to form a functioning general AI yet.

nl · on Oct 13, 2016

That depends on a functional definition of AGI.

My preferred one is "An AGI is one which knows which are sensible questions to ask".

That's because it seems to me that most "AI-lite"-type goals are procedural. AGI needs to have agency.

idunning · on Oct 12, 2016

Blog post for the paper: https://deepmind.com/blog/differentiable-neural-computers/

triplefloat · on Oct 12, 2016

Very exciting extension of Neural Turing Machines. As a side note: Gated Graph Sequence Neural Networks (https://arxiv.org/abs/1511.05493) perform similarly or better on the bAbI tasks mentioned in the paper. The comparison to existing graph neural network models apparently didn't make it into the paper (sadly).

gallerdude · on Oct 12, 2016

Can someone explain what the full implications of this are? This seems really cool, but I can't really wrap my head around it.

From what I can tell you can give the DNC simple inputs and it can derive complex answers.

AlexCoventry · on Oct 12, 2016

It separates the concern of memorization from those of training and processing. In most current neural architectures, patterns in the training data are implicitly represented in the trained neural weights, and the net is implicitly forced to develop recall of past events by transmitting them from each time step to the next via neural net outputs.

The framework in this paper trains a neural net which interacts with a memory bank in a manner similar to a CPU. That means it can save and recall data on request, which could lead to more flexible architectures (you can give a trained net different data to recall) and easier training (since a memory-based architecture means the neural weights no longer have to learn the data along with the processing algorithm.)

bra-ket · on Oct 12, 2016

if you're interested in this check out "Reasoning, Attention, Memory (RAM)" NIPS Workshop 2015 organized by Jason Weston (Facebook Research): http://www.thespermwhale.com/jaseweston/ram/

foota · on Oct 13, 2016

I have a couple questions that I'm not getting from this, does this memory persist between each "instance" of a task? Or does it get wiped out after each one? Is this something where you might say present the model with some data that is the input (which it might learn to then store in memory) and then ask a question of it?

i.e, in the blog post it discusses using the network to find the shortest path between two stations, would the steps to do that look like this?

1. Train the NN how to navigate any network, presenting the graph data each time you ask the NN a problem 2. take the trained NN and feed it the London Underground, then ask it to tell you how to get there?

zardo · on Oct 12, 2016

Instead of saving the data, you could think of using a memory address as applying the identity function and saving the data.

Could it learn to use addresses that perform more interesting functions than f(x)=x?

_ytji · on Oct 12, 2016

I'm probably totally off base here (neural networks/AI is not my wheelhouse), but is having "memory" in neural networks a new thing? Isn't this just a different application of a more typical 'feedback loop' in the network?

choxi · on Oct 12, 2016

You're correct in a way, you can think of neural nets "remembering" the data set they're trained on. Recurrent neural nets even explicitly have a "feedback loop" like you're referring to that allows them to "remember" previous samples. An example of that is in natural language processing where you want to be able to remember the previous words in a sentence to interpret the current word.

Remembering the previous words in a sentence you're currently reading is more like short term memory though, and this paper is talking about long term memories stored as data structures outside of the neural net itself. This graphic from the DeepMind blog post might be helpful: https://i.imgur.com/KwXXCge.png.

The blog post from DeepMind is a bit more accessible than the Nature paper: https://deepmind.com/blog/differentiable-neural-computers/

modeless · on Oct 12, 2016

The "memory" in a typical recurrent neural network is akin to a human's short term working memory. It only holds a few things and forgets old things quickly as new things come in. This new memory can hold a large number of things and stores them for an unlimited amount of time, more like a human's long term memory or a computer's RAM.

AlexCoventry · on Oct 12, 2016

From the paper's "System Overview":

  An earlier form of DNC, the neural Turing machine16, had a
  similar structure, but more limited memory access methods
  (see Methods for further discussion).

gallerdude · on Oct 12, 2016

Does this mean we could get way better versions of char-rnn?

bbctol · on Oct 12, 2016

This hopefully could replace current char-rnn with something very different. Char-rnn is a long short term memory system, where recurrence in the structure of the neural network allows short-term information to persist and inform future actions. This paper almost mimics the brain's separate long and short term memory structures, and could store long-term memory separately from its main activities until needed.

bluetwo · on Oct 12, 2016

One of the examples given is a block puzzle (reorder 8 pieces in a 3x3 grid back into order)

Has this been a problem for AI and CNN's?

cscurmudgeon · on Oct 12, 2016

That problem was solved by a non learning AI system decades ago. Current theorem provers (a related field of AI) solve problems like this in fractions of a second.

The progress in the article is getting a learning system to do so, eventually leading us to handle unsolved problems.

taneq · on Oct 13, 2016

This seems like a bit of an unfair comparison. The 'decades ago' solution was a system, built by humans, that can solve this problem (and very closely related ones), whereas this solution is a system, built by humans, that can design a system to solve the problem.

bluetwo · on Oct 13, 2016

I agree with the point taniq makes in that those easier systems did in fact require a lot of hand crafting, even if parts were automated. I find it interesting the points at which the usefulness of these approaches plateaus.

I am interested a lot in general game playing, and there is a common problem that while the general systems tend to make interesting progress, it is the systems finely crafted to the game that win competitions.

What I am really, REALLY interested in is what commercial application exist for these types of technologies. Solving a puzzle slightly better than a different tool is fun, but solving a valuable business problem is where the money is at.

cscurmudgeon · on Oct 13, 2016

No, it is not an unfair comparison. Those systems were not hand crafted. They were implemented via first-order logic theorem provers. FOL theorem proving is Turing complete and extremely expressive.

This not a me vs you camp. Just a scientific statement.

bluetwo · on Oct 12, 2016

Interesting. I have an AI agent that can solve these types of problems.

What kinds of other unsolved problems are out there? I'm always looking for something interesting.

cscurmudgeon · on Oct 13, 2016

Look up past proceedings in IJCAI or AAAI conferences. And look at logic-based papers.

0xdeadbeefbabe · on Oct 12, 2016

> a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols

A neural network without memory can't do that or can't do it as well perhaps?

AlexCoventry · on Oct 12, 2016

In fig. 5a, they compare its performance to that of an LSTM trained on the same problem, and it does seem to do much better.

prats226 · on Oct 14, 2016

I am guessing for an LSTM based neural network to learn memory sequencing for the purpose of solving problems will need a much deeper and wider network which a separate memory block tries to provide with ready made logic and much simpler network so it doesn't have to learn those actions.

prats226 · on Oct 13, 2016

Would love to see if these networks learn concepts of fast retrieval for eg indexing etc

plg · on Oct 13, 2016

but why use an ANN for tasks involving symbolic logic? I don't get it. It's like ANNs are jumping the shark

ktamiola · on Oct 12, 2016

This is remarkable!

aminorex · on Oct 12, 2016

^ This is called proof by construction.