> huggingface/transformers has a language-modeling example. It is full-featured ...

cs702 · on Aug 17, 2020

FWIW, I took that to mean "code path is challenging [for a human being] to trace."

fabmilo · on Aug 17, 2020

Try to read the code and understand how it works and you will find it very challenging to interpret. But not just that even the documentation is very sparse and hard to read. Compare that to the open-ai code, is so coincise and easy to read. There is mastery in doing that, deep mastery. Few repositories on tensorflow or pytorch organization get to that level.

sillysaurusx · on Aug 17, 2020

Agreed re: OpenAI's GPT implementation. It took roughly a year to appreciate how simple it is. https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...

Especially compared to StyleGAN, BERT, or pretty much anything else.

I used to hate the OpenAI GPT codebase: zero comments? no classes? What does "mlp" even mean? But over time, I find myself reverting to their style.

_pd19 · on Aug 17, 2020

Honestly the library doesn't seem that hard to understand, although it can be under documented at times - I found looking through the source very helpful.

activatedgeek · on Aug 17, 2020

I think the argument here is about pedagogy not performance.

karpathy · on Aug 17, 2020

minGPT is actually quite performant too, the min refers to breadth of supported functionality (eg the absence of support for various additional conditioning, exotic masking, masked LMs, finetuning, pruning, etc).

minimaxir · on Aug 17, 2020

GPT training performance on the CPU is funny. The vocab size and context window size have a massive effect on both speed and accuracy.

activatedgeek · on Aug 17, 2020

Sure thing! I only meant to imply the relative ordering of considerations.

master_yoda_1 · on Aug 17, 2020

I agree with this. Changes for Deep learning models is so fast there is no point maintaining a super reusable code.