One common approach is to look for the elbow in the curve <metric> vs K (number of clusters). This is essentially finding the number of clusters after which the rate of information gained/variance explained/<metric> slows. I believe it's possible to binary search for this point if you can assume the curve is convex.
I think it is more accurate to say that data science isn't Kaggle. The process of taking a data set and fitting it to the most robust model is certainly machine learning.
Yeah, trg2 would need to put all the edge case rules towards the beginning. That, or just put the basic rules at the beginning and have separate conditionals at the end to handle the edge cases.