Why not use game emulators? With popular NES emulators you can advance the game frame by frame. You can read the raw memory addresses that correspond to the score. You can dump the memory at any time and reload the game to a specific game state. You can even manipulate the games in many fun ways by messing around with the game memory. Or give an AI algorithm access to memory addresses as additional information, instead of relying on pure machine vision, if you want to do that..
Here's an example of a guy who made a general game playing algorithm that brute forces it's way through any NES game: https://www.youtube.com/watch?v=xOCurBYI_gY This isn't necessarily interesting from an AI perspective - the playing algorithm is just brute force. But it shows what can be done with the platform, easily reloading to previous states and exploring counterfactual futures (which is exactly the sort of thing RL algorithms do.) He also has a cool algorithm for finding the objective function of an arbitrary game, by watching a human play, and seeing what memory addresses increment. Which is a lot more easy to use than writing OCR code to read the score and game over states from the screen.
Great project. We've found that the VNC Universe environments are hard for today's RL algorithms primarily due to the their async nature. We're currently working on a new set of Universe environments without VNC; I'm very happy to see others inspired by the core ideas of Universe as well.
(Author here). Hi Greg! I am excited to hear about the new Universe environments. I want as many RL environments as possible for my upcoming project, so I will probably draw from Universe and ALE as well as µniverse.
I took a lot of inspiration from Universe and am grateful for OpenAI's work on RL in general :). I probably wouldn't have started on this project if a company like OpenAI hadn't already decided it was a worthy goal.
honest question, how interested is the academia/industry in deep learning libraries & game engines integrations? I've worked on unreal and tensorflow the last semester, and I found out that there aren't any existing integrations. I will probably work on a plugin, but I wanted to know if there is any interest?
The way I see it, having hooks into the engines themselves helps with what the article talks about - not needing to go through VNCs or other _glue_ to get realtime data. It could potentially send the framebuffers themselves directly from the game/simulation and tie in the actions back to the game/simulation. And using framebuffers is just one direction, we could instead stream the co-ords/the current payoff/etc.
Also, having such plugins would help with the adoption in both directions - games now have an always updating/learning AI (might need a network connection + cloud backend), and researchers can have training/testing environments.
This is great. Using HTML5 games in a headless browser makes a lot of sense because the need for VNC is circumvented. However, I think that while OpenAI's implementation is certainly not the best, having access just the information on the screen is not a bad idea in itself as a (maybe optional) constraint. With access to the game's internal state we don't even need RL for solving a large number of games - algorithms like NEAT are sufficient.
This project doesn't change that. The agents still only get screenshots of the game as far as I understand.
However I think this approach is bad. Machine vision is a separate problem from reinforcement learning. You shouldn't need to be able to do both well. Machine vision consumes a ton of processing power and researcher time in figuring out the hyperparameters. And all it's doing is figuring out information that's already in memory like the location of various objects and the score. It really limits what can be done. E.g. the famous atari playing AIs by deepmind were limited to no memory and only knowing the last few frames, because backpropagating through thousands of frames was too expensive.
Because of the way NNs work, it's trivial to separate out the machine vision into a separate module. So if you have a good RNN reinforcement learning system, you can easily add a machine vision learning system to it later if you need.
In terms of "backpropagating through thousands of frames", it's not as expensive as you might think. I've used TRPO to train RNNs on games like Atari pong with thousands of frames per episode. This can be done via an algorithm that reduces the memory complexity of RNN backpropagation (these algorithms didn't exist in 2013). See for example https://arxiv.org/abs/1606.03401.
Edit: that said the project seems to have some interesting and needed improvements (esp time adjustment). Glad to see dialog between muniverse and openai here.
Despite the flaws, the nice thing with VNC is its universality to support any apps on a computer. Using HTML5 in a browser limits the scope of things we could encapsulate as environments, and makes it less "universe".
However, there is a difference between the universality of the tech stack and the exposed interface. In my opinion, the future universe would be rich clusters of RL environments with unified API, each of which implemented using different underlying technology to meet the desired synchronicity and frame performance.
I'm pretty sure that was the goal of OpenAI Gym. Gym tries to provide a generic interface for RL environments, and imho it does a nice job. I am working on Python bindings for µniverse now, which should allow µniverse to integrate with Gym.
I'm a little surprised, but this seems like a good idea. HTML5 certainly has a brighter present and future than flash, and skipping the OCR stem should save quite a few CPU cycles.
I am also working on related project. Flash and HTML5 games in chrome are great but they are very far away from the initially promised full blown GTA5, Starcraft and other complex envs. I am in process of remaking the Universe framework for host machine, since running those computation intensive games at reasonable frame is nearly impossible inside docker or virtual machines.
Yeah, really interested in hearing their take on this. It's not often you see a Musk-sponsored enterprise cast a major project aside without public comment.
The main reason people in the AI community believe Universe has been abandoned is because the engineers who worked on it have been laid off, and also because none of the promised updates actually materialized. This doesn't preclude the possibility of a fresh non-VNC take in universe with a smaller team of course, perhaps also with more focus on benchmarking (like Atari, Labyrinth) than universality.
It's because the people actually working on AI, including OpenAI, finally knocked some sense into Elon Musk. He finally realized how far behind AI is (it is a glorified linear regression) and we won't be seeing general AI for at least another 40 years.
I got my PhD in machine learning and NLP and did a 3-yr postdoc on deep learning.
My advisor shared the following wisdom with me: "When the experts in your field say that saying can be done, they are probably right. When the experts in your field say that something cannot be done, they are not necessarily right."
> When the experts in your field say that saying can be done, they are probably right.
Generally yes, but they may be significantly off on the timeframe. One famous example is that once alpha-beta search was invented (in the late 1950s), Herb Simon predicted that "within ten years a digital computer will be the world's chess champion". That did eventually happen, using techniques not even all that different from alpha-beta search, but it took 40 years rather than 10. Many of the 1980s neural nets claims turned out to be eventually vindicated too, but it took 30 years, which was quiet a bit longer than the optimistic portion of 1980s "connectionists" expected.
That's the type of skepticism I usually have with claims today too. When people say "there will be fully autonomous self-driving cars on the road by 2020", I don't doubt it'll happen, but whether it'll happen in less than 3 years I have more doubts about. You could argue AI researchers have gotten better at accurately predicting the timeframes of advances than they were in the early days of AI, but I'm not sure there is solid evidence of that (would be interesting if someone has studied it).
It can happen the other way around too though. Few people predicted the massive jump in AI ability the last few years. Notable AI researchers said it would take decades to get to human accuracy on imagenet, and they were wrong within a few years. I recall reading the first deep learning Go papers around 2015 and thinking that superhuman Go AI was inevitable in a few years. And when I discussed it with other people they were very skeptical and thought it was unlikely. And then AlphaGo came out...
So when you say "It's because...", are you in touch with people working there, or are you just guessing that this transpired because it seems like a reasonable assumption to you?
Would be interested to know how you reached that 40 years number. I don't think we are even remotely close to AGI, 40 years to me seems extremely optimistic. That's within my lifetime.
Probably the same way everyone does, by pulling it out of thin air as a guess. When nobody even knows what theoretical breakthroughs are necessary, you'll always end up with a scattershot all over the place, even amongst experts. Try asking working mathematicians how long until the Riemann hypothesis is resolved one way or another, or look at what people were saying about Fermat's Last Theorem up until it was solved.
What we do know is that current techniques won't get us close to AGI, so something new is needed (or perhaps like backprop, something old will work once we have enough compute power). Personally I'm bullish on AGI because I have strikingly low faith in the ability of evolution to operate very effectively as a tool for algorithm discovery, so I suspect that once we've hit the compute threshold we'll find that many different algorithms can do the trick, and 40 years is probably not out of the question for us to hit that point (or 10, or 100), depending who you talk to about what the compute threshold might be.
I'd caution against putting too much weight in what experts say, though, since with a tiny few set of exceptions anyone working on "AI" today is actually just working on narrow AI, which is, as someone put it, just glorified linear regression. Those tools will almost certainly be part of the solution, but only in the sense that the classical theory of Diophantine equations was part of Weil's proof of Fermat's Last Theorem - they are not the core of the theoretical approach.
Evolution is a slow algorithm, but it had access to an absurd amount of compute (all neuronal organic matter on Earth) and environment simulation (all of physical reality on Earth) when discovering us; so the discovery of the algorithms/architectures/principles in our heads shouldn't be viewed as trivial.
The massive compute/time advantage evolution has makes me bearish about AGI. We really need to fix our compute capabilities before we can start overruning evolution. The math dictates it'll happen, but exponentially slowly if we don't innovate in compute.
There's more to the story, too: advances on top of CRISP may give us better tools to self-improve the species, accelerating evolution.
Personally, I'm bearish about AGI because I believe we will eventually realize that the brain is a glorified linear regression too, with a custom wiring to help learn language and vision.
>What we do know is that current techniques won't get us close to AGI, so something new is needed (or perhaps like backprop, something old will work once we have enough compute power).
With backprop we didn't just need bigger machines, we needed better algorithms, palliatives for the exploding-gradient problem that made values exceed our numerical representations, and then hardware specifically designed for doing the matrix-ops involved.
If I saw something capable of speeding up probabilistic program inference the way GPUs sped up backprop, I'd start saying we should expect to see powerful AI applications quite soon.
Better algorithms were invented because of bigger machines. Once computers got fast enough, researchers could experiment around with different algorithms on realistic sized models and datasets. Without waiting 2 years for the experiment to finish training.
Probabilistic programming isn't going to help general AI much. Things like dropout seem to work well enough, and for the most part AI is severely underfitting rather than overfitting. Our models are far to simple and small to really learn language and do complicated reasoning. Making them bayesian doesn't fix that.
>Probabilistic programming isn't going to help general AI much.
Excuse me while I laugh.[1,2,3,4]
>Things like dropout seem to work well enough, and for the most part AI is severely underfitting rather than overfitting.
For the most part, neural networks can't reason at all. They just induce deterministic functions over high-dimensional Euclidean spaces.
>Our models are far to simple and small to really learn language and do complicated reasoning.
They're also not compositional (new concepts as functions of old concepts), productive (able to draw an unbounded number of inferences from each representation), or unbounded in size of representation (unboundedly many concepts). Neural networks don't even represent causal structure, let alone model how an intervention will affect outcomes!
It is, however, really nice to hear an AI booster admit just how incredibly limited connectionist models actually are.
>Making them bayesian doesn't fix that.
No, changing to a causal, compositional representation that allows for productive and nonparametric (unboundedly large) learning does that. The Bayesian part just makes it extra nice by letting us "put information in" anywhere in the model (at any variable) by conditioning.
I'd be interested in hearing more background here. Last time I heard Musk say anything about AI, he was still on the hype train to crazy-town, talking about the world-conquering things it would do in the coming decades that have nothing to do with what anyone's researching right now.
The idea that OpenAI could talk him down is pretty impressive, and if true I would significantly positively update my impression of OpenAI. (I thought OpenAI was funded by people on this hype train.)
Universe is purposedly being abandonned (a specific training framework) not OpenAI...
But thank you for your valuable insight, we all know being an AI research scientists gives you a direct connection to Elon's brain.
Edit: And seems like you are wrong anyway, see top comment.
Congratulations on the initiative, it looks very cool! Indeed, we found that running asynchronous environments, while possible, proved to be too cumbersome for research. We're now working on a synchronous set of environments for universe that are easier to use.
Here's an example of a guy who made a general game playing algorithm that brute forces it's way through any NES game: https://www.youtube.com/watch?v=xOCurBYI_gY This isn't necessarily interesting from an AI perspective - the playing algorithm is just brute force. But it shows what can be done with the platform, easily reloading to previous states and exploring counterfactual futures (which is exactly the sort of thing RL algorithms do.) He also has a cool algorithm for finding the objective function of an arbitrary game, by watching a human play, and seeing what memory addresses increment. Which is a lot more easy to use than writing OCR code to read the score and game over states from the screen.