And his visualization of constrained optimization is astonishing https://explained.ai/regularization/index.html (I struggled for a long time to get the right intuition of a Lagrangian)
If you're interested, in my thesis I induced l1-regularized decision trees through a boosting style approach. Adding an l1 term and maximizing the gradient led to sparse tree.
Very nice work. Glad to see both classification and regression treated very well, with careful attention to design to make something that is easy to understand.
Now the question is - can we build on this (or do something analogous) for tree ensembles? Random Forests, Gradient Boosted Trees etc. Quite common to use that to gain predictive accuracy, though interpretability/explainability tends to suffer considerably.
Great explanation; I understand different visual tree orientations. I agree with the lesson learned section; it's not about programming alone but also about determining the ecosystem's capabilities.