Hacker Newsnew | past | comments | ask | show | jobs | submit | more gdb's commentslogin

You can use the API live in multiple products, such as AI Dungeon (https://play.aidungeon.io/)!


Thanks Greg. Will check it out. Would love to see AI move from the "it's fun" zone to "make some money" zone soon though. We are all invested in the success of AI :)


OpenAI LP, our "capped-profit" entity which has taken investment, has a fiduciary duty to the OpenAI Charter: https://openai.com/blog/openai-lp/


Mind changed. Keep leading the way!


Sign up for the beta if you'd like to be the one to flesh them out :)!


I definitely did immediately after seeing this. Being neither an academic nor representing a recognizable name brand company, I don’t know if I should have my hopes up too high for getting access soon, but I certainly hope so. I’d love to play around with this and push its limits for some creative hackathon-style side projects!

Just wanted to add: It’s amazing all the negativity in this discussion. Whatever happened to the creative tech community who loves to push boundaries? Isn’t that still part of the hacker ethos, isn’t this still hacker news? Just because a tool has the potential to be used for bad doesn’t mean we shouldn’t be excited to find new ways to use it for good.


(I work at OpenAI. Before that, I worked at Stripe. I've spent most of my software career thinking about how to build effective engineering cultures.)

I think this code is actually well-written and maintainable. This is proven in practice because we've adopted it many places in OpenAI, and I've personally found it very easy to adapt to other use-cases (certainly much more so than the from-scratch Transformer implementations I've written!).

As https://news.ycombinator.com/item?id=21456605 points out, the complexity of the code arises from the complexity of the underlying algorithm. Complexity due to software engineering concerns, like Tensorflow scopes, are elegantly handled. [edited for clarity:] Writing a Transformer in 174 lines of code requires a lot of deep thinking about the right underlying abstractions.

> but essentially as soon as you enter the 'ml engineer/research engineer/research scientist' layer, it's doomed.

We actually don't do this! Our only official technical title is "member of technical staff". (People sometimes choose to self-identify as an engineer or researcher, so you might see that on LinkedIn, but we don't have a distinction internally.) Everyone is responsible for their own code, and people care quite a bit about writing code that others can build on.


Ok, since you took the time to respond, I just want to be constructive as well:

So I don't have a big problem with some of the function definitions which can be compact, as the other comment points out.

The reason I don't like this code is that it does not comment anything on the critical bits. I don't necessarily care about whether you call the input to your matmul 'x' or 'tensor' or 'input' (although consistency is nice).

The thing that would stop be from absorbing and modifying this code is that it does not comment on all the bits that are non obvious to me if I haven't written a Transformer before. For example:

'Same as tf.matrix_band_part(tf.ones([nd, ns]), -1, ns-nd), but doesn't produce garbage on TPUs.' - I will have to ask the colleague what that means. Why not write out what the actual issue is instead of mysteriously hinting at some potential problem?

Code like this "q, k, v = map(split_heads, tf.split(c, 3, axis=2))" will require me re-reading the paper section, then printing out all the tensors to think about what tensor would have which shape at which point. Instead of writing relatively useless linecomments like '#Transformer', I would comment all non-trivial shape modifications with the current layout, and what we are trying to achieve while modifying the layout.

The other issue of my original comment was not specifically on that codebase, but I am sure you would admit that the baselines code was pretty much exactly what I was writing about re: ml scripts. That's not to denigrate its incredible usefulness to the community.

Since you mentioned spinning up, I thought I would add a few comments on that as well:

I think the spinning up code base is good at making the code compact, and terrible at making sense of data flow for beginners. There are a lot of line comments, but they do not actually explain what is conceptually going on but often just repeat short-hands.

For example, look at the PPO implementation: https://github.com/openai/spinningup/blob/master/spinup/algo...

Here, the function is returning pi, logp, and log_p_pi (and v). Do you know how incredibly confusing the distinction between these is for beginners? In particular, there is no explanation why logp_pi even needs to be stored in the buffer.

We could recompute it from the states and stop the gradient when computing the likelihood ratio. A sensible tutorial-level comment here may be something along the lines of computing the likelihood in the same forward pass as computing the action, so we can later use it to compute the likelihood ratio. We could also later re-compute this from the buffered states.

I will stop here but I hope my point comes across, whenever I read code from your repos, there are some good parts (conciseness, cute numerical tricks) but there is as general missing sense of thoughtfulness on what the code is really trying to convey to a reader. It shows in the comments and it shows in the code organisation.

As a final note, I have seen this in many organisations and I do not mean to call you out. There is just this quality degradation that inevitably happens when nobody is incentivised (read: promoted, rewarded) to think about these things for an organisation.

Managers at all levels typically don't because they don't get close enough to the subtle issues on a day to day level. If you are lucky, you get senior individual contributors who still look at code and and raise the bar for the entire org. My genuine recommendation to you is to look for that, because a manager won't do that, and more fresh grads can't do it.


Hello! Spinning Up author here.

Very reasonable point that it is not clearly explained why you need to store logp_pi in the buffer. But the reason is that it would require additional code complexity to calculate it on the fly later. The likelihood ratio requires the denominator to be on the _old_ policy, so if you wanted to compute it on the fly, you would need to have a second policy in the computation graph to preserve the old policy while you change the current policy. You could not simply do a stop_gradient on the current policy and get the same results.

My personal feeling is that tutorial-style explanations like this don't fit nicely into code comment flow. As a result, most tutorial-style descriptions went into material on the Spinning Up website rather than into the code. It isn't 100% comprehensive, certainly, but RL has an enormous surface area (there are tons and tons of little details that teaching material could dive into) and I feel pretty good about what we were able to cover. :)


Thank you for responding. Well, my point is that in particular the gradient on the likelihood ratio is what trips people up. They ask questions like 'why is this ratio not always 1' or similar. This is why I would say explaining what is going where here is critical, i.e. that we save the prior logp_pi (even though we could recompute it) to treat it as a constant value when computing the ratio/the gradient. That would be, from my perspective, the key pedagogical moment of a PPO tutorial. However his is purely subjective and I agree that one can feel differently about where to put explanations.


I’m very sorry to see someone who obviously cares so much to be defending this code. This does not follow best practices, and using complexity of the underlying algorithm is just an excuse. Complex code can be beautiful and well documented.

Writing a complex method in 174 lines is not elegant nor beautiful. Writing a well documented file that can take an engineer in a different specialty and bring them up to speed in 1,000 lines is.


No matter how much you comment your code, you are not going to bring people up to speed on an algorithm that requires background knowledge on dozens of scientific papers, hundreds of pages, with a few code comments. This code is aimed at researchers who are familiar with the techniques and have the necessary background knowledge. For such people, the code is very readable.

Think about it like this: If you write a game engine, are you going to document each function with proofs that explain how the underlying Physics works for people who don't have such knowledge? No, you assume that people who read game engine code have read some basic physics books.


We also have code like that. For example, that's the explicit goal of the Spinning Up repo: https://github.com/openai/spinningup/blob/master/spinup/algo...

In practice, it's much harder to use that code, and we tend not to consume code like that internally. There's a real tradeoff!


ddpg() takes 17 parameters and is over 200 lines long. I'm very far from being a domain expert, but having worked in other complex domains, I'm pretty confident this can be redesigned such that it's both more maintainable and more pleasant to use.


Hello! Spinning Up author here. I would love to hear your thoughts on this! So far I have had a lot of success teaching people about DDPG using this code example, but I'm grateful for every fresh perspective. :)

Feel free to reach out by email, jachiam at openai.


There is no function in the world that should ever take 17 parameters. If the algorithm permits such configuration, as I am sure it does, then it should take a configuration object which has all these values. The object could then be constructed using special purpose factories that take fewer parameters, and then customized from there as needed.

It may be an indication that the whole thing needs refactoring though.


You refactor that way but then you make it unnecessarily more complicated.


Well then I’m sorry for that. It’s a good indicator of a broken culture.


> I've just lost faith in the organization to be honest

:( sorry to hear that.

Note that we have a publicly-available Charter which spells out how we operate: http://openai.com/charter. We all use that to guide our actions, and we have not changed a single word in it since publication. I hope that as time goes on, you'll increasingly see the consistency between our actions and the words in the Charter, we'll be able to win back your support.


FWIW, just watching you interact with your critics on here has done a lot for you to earn my respect. I hope I've been too quick to judge and that OpenAI is able to make me reconsider my stance in the future.


(I work at OpenAI.)

Per https://news.ycombinator.com/item?id=21306452, we have results for both instrumented and uninstrumented cubes!

Our cube variants are listed in the blog post ("Behind the scenes: Rubik’s Cube prototypes"), and results are in the paper in Table 6.


According to the Table 6, the performance of not instrumented cube is 20% for applying half of a fair scramble and 0% for a full scramble. Right?


Yes — but note that "success" is a not-very-granular metric (which we set for ourselves!), as it means fully unscrambling the cube without dropping it once.

To be clear, that means executing up to 100 moves without a drop. If you put the cube back in the robot's hand, without any additional effort it'll continue solving unfazed.


There was no uninstrumented cubes. What OpenAI claims to be a "regular Rubik's cube" is in fact not regular, but has color stickers cut. OpenAI couldn't get regular Rubik's cube working.


(I work at OpenAI.)

We also have results with an uninstrumented cube (as described in section 7 in the paper, or "Behind the scenes: Rubik's Cube prototypes" in the blog post), which are slightly weaker (see Table 6 in the paper). The 20% number is an example of critics cherry-picking their facts — the success rate is 60% under normal conditions, but 20% with a maximally-scrambled cube (which would happen randomly with probability less than 10^-20).

Also note: success here means that the robot was able to perfectly unscramble the cube — which requires perfectly performing up to 100 moves — without dropping it once. What it means, in practical terms, is that you need to wait for a long time in order to witness even a single failure. If you pick up the cube and place it back in the hand, it'll get right back to solving.

Note that like with OpenAI Five, the success rate is more of a function of how far we've had time to push the system than something fundamental. We're not building commercial robots; we're in the business of making new AI breakthroughs. So the next step for our robotics team will be finding a new task that feels impossible today, and see what it takes to make it no longer feel impossible.


From the blog post:

> Our method currently solves the Rubik’s Cube 20% of the time when applying a maximally difficult scramble that requires 26 face rotations. For simpler scrambles that require 15 rotations to undo, the success rate is 60%.

And looking at the data in http://cube20.org/qtm/ , with a random cube, the probability to have a maximal-scrambled cube that needs 26 quarter-turns is 10^-20, but most (~75%) of the random cubes need 20 or 21 quarter-turns. Most of the algorithms don't use the most efficient path to solve the cube, so if the best path has 20 steps, the actual path will have a hundred or more steps.

To solve the cube in 15 steps, it must start as almost solved. It's not what people usually call "normal conditions".


Where do you get these numbers from?

Under "normal conditions", my reading is the success is 20%, 0% for a maximally-scrambled cube. I don't feel the Giiker Cube can't be considered "normal conditions".


Good catch! Will update the post to be explicit that there are many pre-existing awesome results in this vein.


Any possibility of releasing the simulation environment? Looks quite cool!


At the top of the article are two open source repo links:

Environment Generation: https://github.com/openai/multi-agent-emergence-environments

Worldgen: https://github.com/openai/mujoco-worldgen

Is this what you were looking for?


> I really wish I had more of an opportunity to play with AI/ML in my day to day work.

Anyone who feels this way — we're hiring :)! https://openai.com/jobs/.

(Also if I can answer any questions about OpenAI, feel free to ping me at gdb@openai.com.)


Anything remote friendly? Unfortunately all jobs are in SF only.


(I wrote the post.)

If it's helpful, I dropped out of both schools — the vast majority of my knowledge is self taught!


>I learn best when I have something specific in mind to build.

This is so incredibly important for me and, based on my conversations, many others as well.

The other thing I struggle with is the feeling that many of the problems I wish to solve are likely also solvable with simpler statistical methods and that I'm just being a poser by trying to pound them home with the ML hammer.


Question: will the OpenAI fellows curriculum ever be released? I need a nice, structured intro to deep learning research and feel like the curriculum modules your company has developed would have the highest quality.

(For reference, I’m an undergrad looking to get into this field)


How often do you go "back to basics" (read introductory book chapters, or classic papers, etc)?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: