Good point, we should provide more detailed metrics. Since we are very early, we focus on the main metric in our view: higher accuracy of changes to be more practically usable. We will do more testing on overfitting and how the model performance on different types of tasks. On high level we believe in the idea of "a well fine-tuned model should be much better than a large general model". But we need more metrics, I agree.