Hacker News new | past | comments | ask | show | jobs | submit login

And yet, anybody who actually works in machine learning has repeatedly said "figure out what problem you're solving first, then determine what data you might need - don't just throw a classifier at a data garbage heap"

And the experience doesn't show more data is better, it shows that an excess of features leads to overfitting.




Sure, having more features gives a model more opportunities to overfit. But having more data points has the opposite effect, since they reflect the underlying distribution better and therefore provide a better estimate of actual model performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: