> what they did here is take the core pre-trained GPT model, did Supervised Fine Tuning with Othello moves
They didn't start with an existing model. They trained a small GPT from scratch, so the resulting model had never seen any inputs except Othello moves.
They didn't start with an existing model. They trained a small GPT from scratch, so the resulting model had never seen any inputs except Othello moves.