Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would be interesting if they compared training speed with CatBoost [0].

I remember seeing a paper where they managed to avoid getting stuck in local optimum in terms of number of learners, and the more trees you add better the result.

Logloss results seem to confirm there's a superior tree algorithm going on there in CatBoost.

[0]: https://catboost.yandex/



CatBoost is benefited from categorical feature transform. LightGBM is also working for the better categorical feature support (https://github.com/Microsoft/LightGBM/issues/699). I think the accuracy of LightGBM will be comparable with CatBoost when it is finished.


from the catboost page there's also a link to this:

Fighting biases with dynamic boosting - Dorogush, Gulin, Gusev, Kazeev, Prokhorenkova, Vorobev

https://arxiv.org/pdf/1706.09516.pdf

> While gradient boosting algorithms are the workhorse of modern industrial machine learning and data science, all current implementations are susceptible to a non-trivial but damaging form of label leakage. It results in a systematic bias in pointwise gradient estimates that lead to reduced accuracy


I was thinking of this one: https://arxiv.org/pdf/1706.01109.pdf

I see a github link in there https://github.com/arogozhnikov/infiniteboost, but it does not seem to be in CatBoost (as someone here pointed out better logloss has to do with CatBoost handling of categorical features).


here is a benchmark for catboost:

https://github.com/szilard/GBM-perf/issues/4


Oh, so around 15 times slower than LightGBM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: