> huggingface/transformers has a language-modeling example. It is full-featured but as a result also somewhat challenging to trace. E.g. some large functions have as much as 90% unused code behind various branching statments that is unsued in the default setting of simple language modeling.
I don't understand this criticism of Transformers. Doesn't tracing (in both TorchScript and ONNX forms, which Transformers supports for exporting) just take the relevant model graph and freeze it? I don't think either contains the somewhat-weighty performance code.
Try to read the code and understand how it works and you will find it very challenging to interpret. But not just that even the documentation is very sparse and hard to read. Compare that to the open-ai code, is so coincise and easy to read. There is mastery in doing that, deep mastery. Few repositories on tensorflow or pytorch organization get to that level.
Honestly the library doesn't seem that hard to understand, although it can be under documented at times - I found looking through the source very helpful.
minGPT is actually quite performant too, the min refers to breadth of supported functionality (eg the absence of support for various additional conditioning, exotic masking, masked LMs, finetuning, pruning, etc).
I don't understand this criticism of Transformers. Doesn't tracing (in both TorchScript and ONNX forms, which Transformers supports for exporting) just take the relevant model graph and freeze it? I don't think either contains the somewhat-weighty performance code.