How do you deal with different dataset train/validation/test? How do you measure...

montanalow · on May 2, 2022

The data split technique is one of the optional parameters for the call to ‘train’. Model degradation is a really interesting topic, that is hopefully made less difficult when retraining is trivialized, but we also want to add deeper analytics into individual model predictions, as well as better model explanations with tools like shap. We haven’t exposed custom performance metrics in the API yet, but we’re computing a few right now and can add more. The next thing we may build could be a configuration wizard to help make these decisions easy based on some guided data analysis.

jorgemf · on May 2, 2022

When I was talking about metrics I meant metrics of the model (accuracy, precission, recall, mean square error, etc), not performance.

montanalow · on May 3, 2022

We currently calculate those as applicable to classification and regression, but they are only displayed on a model detail page like here:

https://demo.postgresml.org/models/1 https://demo.postgresml.org/models/15

The short term goal would be to expose more metrics from the toolkit.