1. Contact a company that has a searchengine and therefore access to all your links. ( http://samuru.com ) springs to mind.
2. Do keyword extraction of those pages. Assume that anything that doesn't have any of the keywords of the page that is being linked to is a Bad link.
3. The ones that remain Google the keywords you extracted. (like 10 of the words) if the linking page doesn't appear in the top 50 results it is probably a Bad neighbor according to Google.
This method doesn't require NTLK, or Grammar checking. You can do it your self, and you are using Google to tell you if the site is on the Bad Neighbor list so you don't have to guess.
One of the most linked to page on the internet is the download page for Adobe Reader. It is definitely not spam but millions of those links aren't going to have "the keyword" on the page, so by your logic are bad links. This is an extreme example, but it is not an uncommon scenario.
Furthermore, if you have millions of backlinks, it becomes quite difficult to scrape Google (but you can use services like Authority Labs).
You think the Adobe never built any bad links? I know many such large companies that are spendings hundreds of thousands of dollars a year or more buying links.
And it is going to fail A LOT.
Do this instead:
1. Contact a company that has a searchengine and therefore access to all your links. ( http://samuru.com ) springs to mind.
2. Do keyword extraction of those pages. Assume that anything that doesn't have any of the keywords of the page that is being linked to is a Bad link.
3. The ones that remain Google the keywords you extracted. (like 10 of the words) if the linking page doesn't appear in the top 50 results it is probably a Bad neighbor according to Google.
This method doesn't require NTLK, or Grammar checking. You can do it your self, and you are using Google to tell you if the site is on the Bad Neighbor list so you don't have to guess.