Why I’m Remaking OpenAI Universe

Houshalter · on June 26, 2017

Why not use game emulators? With popular NES emulators you can advance the game frame by frame. You can read the raw memory addresses that correspond to the score. You can dump the memory at any time and reload the game to a specific game state. You can even manipulate the games in many fun ways by messing around with the game memory. Or give an AI algorithm access to memory addresses as additional information, instead of relying on pure machine vision, if you want to do that..

Here's an example of a guy who made a general game playing algorithm that brute forces it's way through any NES game: https://www.youtube.com/watch?v=xOCurBYI_gY This isn't necessarily interesting from an AI perspective - the playing algorithm is just brute force. But it shows what can be done with the platform, easily reloading to previous states and exploring counterfactual futures (which is exactly the sort of thing RL algorithms do.) He also has a cool algorithm for finding the objective function of an arbitrary game, by watching a human play, and seeing what memory addresses increment. Which is a lot more easy to use than writing OCR code to read the score and game over states from the screen.

gdb · on June 26, 2017

(I work at OpenAI.)

Great project. We've found that the VNC Universe environments are hard for today's RL algorithms primarily due to the their async nature. We're currently working on a new set of Universe environments without VNC; I'm very happy to see others inspired by the core ideas of Universe as well.

unixpickle · on June 26, 2017

(Author here). Hi Greg! I am excited to hear about the new Universe environments. I want as many RL environments as possible for my upcoming project, so I will probably draw from Universe and ALE as well as µniverse.

I took a lot of inspiration from Universe and am grateful for OpenAI's work on RL in general :). I probably wouldn't have started on this project if a company like OpenAI hadn't already decided it was a worthy goal.

pixelHD · on June 26, 2017

honest question, how interested is the academia/industry in deep learning libraries & game engines integrations? I've worked on unreal and tensorflow the last semester, and I found out that there aren't any existing integrations. I will probably work on a plugin, but I wanted to know if there is any interest?

The way I see it, having hooks into the engines themselves helps with what the article talks about - not needing to go through VNCs or other _glue_ to get realtime data. It could potentially send the framebuffers themselves directly from the game/simulation and tie in the actions back to the game/simulation. And using framebuffers is just one direction, we could instead stream the co-ords/the current payoff/etc.

Also, having such plugins would help with the adoption in both directions - games now have an always updating/learning AI (might need a network connection + cloud backend), and researchers can have training/testing environments.

evc123 · on June 26, 2017

Arthur Juliani is about to open source his interface for connecting ML agents to Unity3D game engine: https://twitter.com/awjuliani/status/879142906178785281

pixelHD · on June 26, 2017

Wow, that looks great! I guess that's what i'm intending to do with unreal.

Aqueous · on June 26, 2017

It seems like you might be duplicating work? At the end he mentions he's dropping VNC in favor of headless Chrome.

du_bing · on June 26, 2017

Oh, Greg, nice to see you here, I am eager to see some solid Universe environments without VNC, that will be interesting.

evc123 · on June 26, 2017

recruit Alex Nichol (unixpickle).

hackpert · on June 26, 2017

This is great. Using HTML5 games in a headless browser makes a lot of sense because the need for VNC is circumvented. However, I think that while OpenAI's implementation is certainly not the best, having access just the information on the screen is not a bad idea in itself as a (maybe optional) constraint. With access to the game's internal state we don't even need RL for solving a large number of games - algorithms like NEAT are sufficient.

Houshalter · on June 26, 2017

This project doesn't change that. The agents still only get screenshots of the game as far as I understand.

However I think this approach is bad. Machine vision is a separate problem from reinforcement learning. You shouldn't need to be able to do both well. Machine vision consumes a ton of processing power and researcher time in figuring out the hyperparameters. And all it's doing is figuring out information that's already in memory like the location of various objects and the score. It really limits what can be done. E.g. the famous atari playing AIs by deepmind were limited to no memory and only knowing the last few frames, because backpropagating through thousands of frames was too expensive.

Because of the way NNs work, it's trivial to separate out the machine vision into a separate module. So if you have a good RNN reinforcement learning system, you can easily add a machine vision learning system to it later if you need.

unixpickle · on June 26, 2017

In terms of "backpropagating through thousands of frames", it's not as expensive as you might think. I've used TRPO to train RNNs on games like Atari pong with thousands of frames per episode. This can be done via an algorithm that reduces the memory complexity of RNN backpropagation (these algorithms didn't exist in 2013). See for example https://arxiv.org/abs/1606.03401.

daveguy · on June 26, 2017

According to the author, "Universe never really took off in the AI world."

That's a bit premature for a project that was just released less than 7 months ago, isn't it?

https://blog.openai.com/universe/

Edit: that said the project seems to have some interesting and needed improvements (esp time adjustment). Glad to see dialog between muniverse and openai here.

evc123 · on June 26, 2017

https://github.com/unixpickle/muniverse

https://github.com/unixpickle/demoverse

strin · on June 26, 2017

Awesome project.

Despite the flaws, the nice thing with VNC is its universality to support any apps on a computer. Using HTML5 in a browser limits the scope of things we could encapsulate as environments, and makes it less "universe".

However, there is a difference between the universality of the tech stack and the exposed interface. In my opinion, the future universe would be rich clusters of RL environments with unified API, each of which implemented using different underlying technology to meet the desired synchronicity and frame performance.

HTML5 could deliver one of such clusters.

unixpickle · on June 26, 2017

I'm pretty sure that was the goal of OpenAI Gym. Gym tries to provide a generic interface for RL environments, and imho it does a nice job. I am working on Python bindings for µniverse now, which should allow µniverse to integrate with Gym.

dswalter · on June 26, 2017

I'm a little surprised, but this seems like a good idea. HTML5 certainly has a brighter present and future than flash, and skipping the OCR stem should save quite a few CPU cycles.

zzh8829 · on June 26, 2017

I am also working on related project. Flash and HTML5 games in chrome are great but they are very far away from the initially promised full blown GTA5, Starcraft and other complex envs. I am in process of remaking the Universe framework for host machine, since running those computation intensive games at reasonable frame is nearly impossible inside docker or virtual machines.

misiti3780 · on June 26, 2017

Did openAI really unofficially abandoned universe ?

windowshopping · on June 26, 2017

Yeah, really interested in hearing their take on this. It's not often you see a Musk-sponsored enterprise cast a major project aside without public comment.

sherjilozair · on June 26, 2017

The main reason people in the AI community believe Universe has been abandoned is because the engineers who worked on it have been laid off, and also because none of the promised updates actually materialized. This doesn't preclude the possibility of a fresh non-VNC take in universe with a smaller team of course, perhaps also with more focus on benchmarking (like Atari, Labyrinth) than universality.

unixpickle · on June 26, 2017

(Author here). I hadn't realized that the engineers were laid off. Where did you find this out?

sherjilozair · on June 26, 2017

Gossip at the RLDM conference.

chronic61a · on June 26, 2017

It's because the people actually working on AI, including OpenAI, finally knocked some sense into Elon Musk. He finally realized how far behind AI is (it is a glorified linear regression) and we won't be seeing general AI for at least another 40 years.

Source: Am an AI research scientist.

bravura · on June 26, 2017

I got my PhD in machine learning and NLP and did a 3-yr postdoc on deep learning.

My advisor shared the following wisdom with me: "When the experts in your field say that saying can be done, they are probably right. When the experts in your field say that something cannot be done, they are not necessarily right."

_delirium · on June 26, 2017

> When the experts in your field say that saying can be done, they are probably right.

Generally yes, but they may be significantly off on the timeframe. One famous example is that once alpha-beta search was invented (in the late 1950s), Herb Simon predicted that "within ten years a digital computer will be the world's chess champion". That did eventually happen, using techniques not even all that different from alpha-beta search, but it took 40 years rather than 10. Many of the 1980s neural nets claims turned out to be eventually vindicated too, but it took 30 years, which was quiet a bit longer than the optimistic portion of 1980s "connectionists" expected.

That's the type of skepticism I usually have with claims today too. When people say "there will be fully autonomous self-driving cars on the road by 2020", I don't doubt it'll happen, but whether it'll happen in less than 3 years I have more doubts about. You could argue AI researchers have gotten better at accurately predicting the timeframes of advances than they were in the early days of AI, but I'm not sure there is solid evidence of that (would be interesting if someone has studied it).

Houshalter · on June 26, 2017

It can happen the other way around too though. Few people predicted the massive jump in AI ability the last few years. Notable AI researchers said it would take decades to get to human accuracy on imagenet, and they were wrong within a few years. I recall reading the first deep learning Go papers around 2015 and thinking that superhuman Go AI was inevitable in a few years. And when I discussed it with other people they were very skeptical and thought it was unlikely. And then AlphaGo came out...

rspeer · on June 26, 2017

> Few people predicted the massive jump in AI ability the last few years.

I'm guessing you work in image recognition, or mostly hear from people who work in image recognition.

There is more to AI, and not all of it is instantly improved by a convolutional neural net.

akkartik · on June 26, 2017

That is Clarke's First Law.

https://en.wikipedia.org/wiki/Clarke%27s_three_laws

bluetwo · on June 26, 2017

If I was having coffee with a PhD in ML and NLP who has years of experience in deep learning, I would ask this:

What are the most valuable, unsolved problems in the field?

windowshopping · on June 26, 2017

So when you say "It's because...", are you in touch with people working there, or are you just guessing that this transpired because it seems like a reasonable assumption to you?

computerex · on June 26, 2017

Would be interested to know how you reached that 40 years number. I don't think we are even remotely close to AGI, 40 years to me seems extremely optimistic. That's within my lifetime.

ewjordan · on June 26, 2017

Probably the same way everyone does, by pulling it out of thin air as a guess. When nobody even knows what theoretical breakthroughs are necessary, you'll always end up with a scattershot all over the place, even amongst experts. Try asking working mathematicians how long until the Riemann hypothesis is resolved one way or another, or look at what people were saying about Fermat's Last Theorem up until it was solved.

What we do know is that current techniques won't get us close to AGI, so something new is needed (or perhaps like backprop, something old will work once we have enough compute power). Personally I'm bullish on AGI because I have strikingly low faith in the ability of evolution to operate very effectively as a tool for algorithm discovery, so I suspect that once we've hit the compute threshold we'll find that many different algorithms can do the trick, and 40 years is probably not out of the question for us to hit that point (or 10, or 100), depending who you talk to about what the compute threshold might be.

I'd caution against putting too much weight in what experts say, though, since with a tiny few set of exceptions anyone working on "AI" today is actually just working on narrow AI, which is, as someone put it, just glorified linear regression. Those tools will almost certainly be part of the solution, but only in the sense that the classical theory of Diophantine equations was part of Weil's proof of Fermat's Last Theorem - they are not the core of the theoretical approach.

evc123 · on June 26, 2017

Evolution has been running ~10^19 experiments in parallel for billions of years: http://reducing-suffering.org/how-many-wild-animals-are-ther...

Evolution is a slow algorithm, but it had access to an absurd amount of compute (all neuronal organic matter on Earth) and environment simulation (all of physical reality on Earth) when discovering us; so the discovery of the algorithms/architectures/principles in our heads shouldn't be viewed as trivial.

sherjilozair · on June 26, 2017

The massive compute/time advantage evolution has makes me bearish about AGI. We really need to fix our compute capabilities before we can start overruning evolution. The math dictates it'll happen, but exponentially slowly if we don't innovate in compute.

espadrine · on June 26, 2017

There's more to the story, too: advances on top of CRISP may give us better tools to self-improve the species, accelerating evolution.

Personally, I'm bearish about AGI because I believe we will eventually realize that the brain is a glorified linear regression too, with a custom wiring to help learn language and vision.

computerex · on June 26, 2017

What do you mean when you say that the brain is a glorified linear regression?

eli_gottlieb · on June 26, 2017

>What we do know is that current techniques won't get us close to AGI, so something new is needed (or perhaps like backprop, something old will work once we have enough compute power).

With backprop we didn't just need bigger machines, we needed better algorithms, palliatives for the exploding-gradient problem that made values exceed our numerical representations, and then hardware specifically designed for doing the matrix-ops involved.

If I saw something capable of speeding up probabilistic program inference the way GPUs sped up backprop, I'd start saying we should expect to see powerful AI applications quite soon.

Houshalter · on June 26, 2017

Better algorithms were invented because of bigger machines. Once computers got fast enough, researchers could experiment around with different algorithms on realistic sized models and datasets. Without waiting 2 years for the experiment to finish training.

Probabilistic programming isn't going to help general AI much. Things like dropout seem to work well enough, and for the most part AI is severely underfitting rather than overfitting. Our models are far to simple and small to really learn language and do complicated reasoning. Making them bayesian doesn't fix that.

eli_gottlieb · on June 26, 2017

>Probabilistic programming isn't going to help general AI much.

Excuse me while I laugh.[1,2,3,4]

>Things like dropout seem to work well enough, and for the most part AI is severely underfitting rather than overfitting.

For the most part, neural networks can't reason at all. They just induce deterministic functions over high-dimensional Euclidean spaces.

>Our models are far to simple and small to really learn language and do complicated reasoning.

They're also not compositional (new concepts as functions of old concepts), productive (able to draw an unbounded number of inferences from each representation), or unbounded in size of representation (unboundedly many concepts). Neural networks don't even represent causal structure, let alone model how an intervention will affect outcomes!

It is, however, really nice to hear an AI booster admit just how incredibly limited connectionist models actually are.

>Making them bayesian doesn't fix that.

No, changing to a causal, compositional representation that allows for productive and nonparametric (unboundedly large) learning does that. The Bayesian part just makes it extra nice by letting us "put information in" anywhere in the model (at any variable) by conditioning.

[1] -- http://forestdb.org/models/learning-physics.html [2] -- http://forestdb.org/models/word-learning.html [3] -- http://forestdb.org/models/arithmetic.html [4] -- http://forestdb.org/models/politeness.html

adrianN · on June 26, 2017

They're a scientist: https://xkcd.com/678/

eli_gottlieb · on June 26, 2017

That's a little sadder now that I've had the "fourth quarter next year" thing happen to me personally.

Houshalter · on June 26, 2017

40 years is very pessimistic. The median estimate given by AI experts is in the 2040s. Moore's law will surpass the human brain before then.

rspeer · on June 26, 2017

I'd be interested in hearing more background here. Last time I heard Musk say anything about AI, he was still on the hype train to crazy-town, talking about the world-conquering things it would do in the coming decades that have nothing to do with what anyone's researching right now.

The idea that OpenAI could talk him down is pretty impressive, and if true I would significantly positively update my impression of OpenAI. (I thought OpenAI was funded by people on this hype train.)

Houshalter · on June 26, 2017

Musk didn't say that AGI was close or that current research was particularly dangerous. He was worried about what might be possible in many decades.

eli_gottlieb · on June 26, 2017

Oh good, they're finally getting it. Now we can maybe have a nice AI winter for deep learning and clear the stage for the next few things.

Voloskaya · on June 26, 2017

Universe is purposedly being abandonned (a specific training framework) not OpenAI... But thank you for your valuable insight, we all know being an AI research scientists gives you a direct connection to Elon's brain.

Edit: And seems like you are wrong anyway, see top comment.

toisanji · on June 26, 2017

adewinter · on June 26, 2017

They switched over to OpenAI Gym which is much broader in scope (able to play Steam based video games).

windowshopping · on June 26, 2017

This question on Quora contradicts you, now I don't know who to believe...

https://www.quora.com/What-is-the-difference-between-OpenAIs...

namuol · on June 26, 2017

Funny, I have an old (unfinished) HTML5 space-exploration game by the same name:

https://github.com/namuol/muniverse

If I had more time I'd submit a PR to integrate it...

make3 · on June 26, 2017

I wonder what's happening with OpenAI. Most big names are leaving.

fggh · on June 26, 2017

Please elaborate...

make3 · on June 28, 2017

Well, Ian Goodfellow and Andrej Karpathy for starters

zach417 · on June 26, 2017

I echo all of your issues with running Universe. I have a decrepit Macbook, and it was actually not possible for me to use it at all.

forgotmyhnacc · on June 26, 2017

If you have trouble running universe, how are you going to run RL algorithms that use lots of gpu and CPU?

tomjacobs · on June 26, 2017

Missed opportunity for a Rick and Morty Microverse reference here as the name

Cellestro · on June 26, 2017

Congratulations on the initiative, it looks very cool! Indeed, we found that running asynchronous environments, while possible, proved to be too cumbersome for research. We're now working on a synchronous set of environments for universe that are easier to use.