I've tried doing something similar with AI the other day. My approach was looking at money flow instead, as in theory, spammers only spam to make money. I basically downloaded an ad-blocker list and ran it against a pages source. That along with a couple of other factors were fed into many attempts of machine learning fun. In the end, it all failed. I learned that it's just impossible without a data-set like google's, so I went and build them into the process, and voila, it worked.
My script basically looked up how high a link ranked on Google for various related keywords. Alexa page rankings and similar services were included aswell. The AI then weighed the factors and tried coming up with an educated guess. After I included external numbers from people with big databases, it became actually very successful.
I tried doing this with reddit links. I downloaded a dataset of 100K submissions and their vote rank. I split them in two, those with less than 5 votes and those with more. It would seem that falls right in the median value.
Then I collected the HTML from those pages and turned it into a feature vector, then tried to learn if a page would have less or more than 5 votes. My prediction rate was as good as random. Fail.
Google does this since Panda. They machine-predict if a page would have success or not with the people and use that as a ranking factor. It makes SEO into a holistic art - you need to think of everything.