Hacker News new | past | comments | ask | show | jobs | submit login

> huggingface/transformers has a language-modeling example. It is full-featured but as a result also somewhat challenging to trace. E.g. some large functions have as much as 90% unused code behind various branching statments that is unsued in the default setting of simple language modeling.

I don't understand this criticism of Transformers. Doesn't tracing (in both TorchScript and ONNX forms, which Transformers supports for exporting) just take the relevant model graph and freeze it? I don't think either contains the somewhat-weighty performance code.




FWIW, I took that to mean "code path is challenging [for a human being] to trace."


Try to read the code and understand how it works and you will find it very challenging to interpret. But not just that even the documentation is very sparse and hard to read. Compare that to the open-ai code, is so coincise and easy to read. There is mastery in doing that, deep mastery. Few repositories on tensorflow or pytorch organization get to that level.


Agreed re: OpenAI's GPT implementation. It took roughly a year to appreciate how simple it is. https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...

Especially compared to StyleGAN, BERT, or pretty much anything else.

I used to hate the OpenAI GPT codebase: zero comments? no classes? What does "mlp" even mean? But over time, I find myself reverting to their style.


Honestly the library doesn't seem that hard to understand, although it can be under documented at times - I found looking through the source very helpful.


I think the argument here is about pedagogy not performance.


minGPT is actually quite performant too, the min refers to breadth of supported functionality (eg the absence of support for various additional conditioning, exotic masking, masked LMs, finetuning, pruning, etc).


GPT training performance on the CPU is funny. The vocab size and context window size have a massive effect on both speed and accuracy.


Sure thing! I only meant to imply the relative ordering of considerations.


I agree with this. Changes for Deep learning models is so fast there is no point maintaining a super reusable code.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: