> what they did here is take the core pre-trained GPT model, did Supervised Fine...

RC_ITR · on March 15, 2023

Generative "Pre-Trained" Transformer - GPT

They did not start with a transformer that had arbitrary parameters, they started with a transformer that had been pre-trained.

fenomas · on March 16, 2023

Pre-training refers to unsupervised training that's done before a model is fine-tuned. The model still starts out random before it's pre-trained.

Here's where the Othello paper's weights are (randomly) initialized: