Hacker News new | past | comments | ask | show | jobs | submit login

Agreed about features and Bayesian filters. Words are just not very good features for filtering spam - but all the numeric data he could easily feed to the Bayesian filter by dividing the data ranges into compartments (like very-many-links-per-word, or over-100-links-per-word).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: