So much bias on that article against Russia, falsely gives the idea that Russia was the major actor pushing artificial tweets.
Hillary herself is sponsoring since years a small army of geeks in full time that blast biased tweets and comments, posing themselves as normal posters instead of being identified as paid comments: http://www.motherjones.com/politics/2014/09/david-brock-hill...
This is a perversion of democracy, not even taking weight the proven federal crimes to which this former presidential candidate seems immune.
The means to bring back balance to democracy are certainly in our hands. This also includes exposing what is wrong with the upper echelons of western society.
of course everyone can read the linked article if they choose, but that Mother Jones article does not even mention Twitter or any other social media site (where I guess the "biased tweets and comments" are supposed to have appeared). It seems to be about a campaign operation focused on delivering pro-HRC information to news reporters.
From this CFR discussion a couple years ago[0]: "… What about Twitter? I have no idea what Twitter is good for. But if it flips out every tyrant in the Middle East, I'm interested." - Michael Rogers, Founder, Practical Futurist; Futurist-in-Residence, New York Times Company
It seems as though some people are starting to realize the weapon they thought was their own, was never theirs at all… inserts something about chickens coming home and roosting
All in all, every step twitter takes to "protect" its platform (euphemism for those with voices most likely to be received and implemented by them, to achieve some arbitrary censorship ends) will be a step forward for federated and decentralized social networks.
Though, who are we kidding, lets hope that in the next election, even more money is wasted by the entrenched and ossified political orders and even more for the one that looses that needs some PR to cover their ass from the carnage that comes after, after all, its just good business.
> Hillary herself is sponsoring since years a small army of geeks in full time that blast biased tweets and comments, posing themselves as normal posters instead of being identified as paid comments
You do understand how campaigning works, right?
Not how you want it to work, but how it actually works.
Really? I'm asking because I don't know the answer; don't you have to identify yourself (like TV ads that say "I'm Obama, and I approve this message").
Really please supply proof of this statement. And again, why hasn't the justice department opened an investigation into this? Both heads of JD and FBI are trump appointees is it not?
To get their training data, they identified 100 accounts that appeared to be bots, and then assumed that all their followers were also bots. They used verified accounts as their sample of non-bot accounts.
They claim they "identify bots 93.5 percent of the time." No data is given about false positives.
While your comment certainly points a flaw in their approach, it is not something that cannot be addressed with a little bit of manpower/insider knowledge/more data.
They have demonstrated a proof of concept that hunting bots on Twitter can be achieved, and that Twitter could do it easily if it wanted
They started by identifying "left- or right-leaning" tweets. Then they decided that everything in-between were bots. Who decides what is left and what is right, and who decided that anything other than those extremes is "fake news"? There are ethical questions here.
It scares me that we now find it acceptable to classify accounts as "bots" or "Russian-backed" because of any content in the tweets, even if it's #LockHerUp or #BuildTheWall. If people really believe these things, shouldn't they be allowed to say them? What if tomorrow, something that you strongly believe in becomes labelled "fake news" or "propaganda", and your own accounts are shut down?
The methodology is flawed in the most fundamental way.
There are other technologies that can be used for detecting bots[0]. Twitter should use it.
Hey, I'm a creator of the tool written in the article!
In our model, we look at hundreds of features per account. Polarizing tweet content is not the only thing the model looks at. If an account is tweeting OC content, it's a clear indication that account is human.
Twitter's policy allows bots of their platform, and most are harmless! There are bots that tweet out the upcoming songs on a radio station. Unfortunately, a CAPTCHA would eliminate all bots, not the ones that spread fake news or claim compromised accounts.
It seems like there's an obvious solution to the problem of how to crack down on the types of bot your AI tries to detect, without removing legitimate boots:
Give users the ability to designate their account as a self-reported bot. Accounts designated this way would be identified as a bot in the Twitter UI, and would be exempt from captchas.
Having false positives are quite serious though, especially when it comes to shutting accounts down at scale. Sure it works on their small sample test runs, but anyone who's applied these algorithms in real life in an automated fashion understand the risks of false positives is usually far higher in practice.
You will basically be put at risk for following 'controversial' accounts on Twitter of having your account delete accidentally. And businesses will have to jump through bureaucratic hoops to use legitimate automated scripts to manage accounts.
It also seriously deters anonymous online speech if they force a big percentage of users to provide a unique phone number to prove they aren't a bot (the current Twitter strategy for countering spam). Not to mention the liberal sharing of information with nation states, surreptitiously or otherwise, even for people not trying to be anonymous. A phone number is a critical identifier in the surveillance industry.
Getting this accurate is very important and I feel like that is being heavily downplayed in this article.
Deletion isn't the only option for escalation; Twitter could assemble a risk factor for a given account and opt to present captchas, for example, rather than automatically deleting. Scale captcha frequency with the risk factor.
Captchas are certainly not foolproof, but they up the difficulty of running a huge number of phony accounts while having a minimal impact on normal Twitter usage.
Twitter already uses automation with false positives. For example get enough followers in short enough amount of time and your account can be taken offline. One of popular ways to using bots as attack weapon these days.
No false positives ship has already sailed. What you can best hope now is timely response to mistakes (and as few of them as possible).
> and that Twitter could do it easily if it wanted
There's the rub, Twitter doesn't want to do that. Why? Good question, but the WSJ adage applies: Follow the money.
I think it's because bots make up most of their users and that getting rid of them is a net negative to them. Not just because they can make stockholders and advertisers think they have more people online than they do. But because trying to continually cull the bots would cost too much as the bot makers up the ante every time. It's not profitable to chase them and it seems that the people that pay Twitter don't care about them.
Also, by bot, I do not mean the hydration sensor in your tomato garden or the code that pings your followers to watch you on Twitch. I mean bots that are trying to masquerade as real humans to get you to buys stuff or vote some way or another. Essentially, things making an attempt at the Turing test.
When people start flooding off the service and the only people left are nazis and bots, what will their revenue look like then? It'll become the reddit of social media, with small, reasonable groups overshadowed by overwhelmingly toxic, hostile, and destructive forces.
How long until their blind eye towards harassment, racism, abuse, and booting collapses any value they have? How long until the average person perceives Twitter as a place where racists congregate and no one else can bring themselves to visit?
And when it does, will they pull a reddit and suddenly grow a conscience, miraculously saying "Oh there's a problem on our service and we're going to charge to the rescue!", or are they going to go 4chan and hide behind "freedom of speech" to excuse the filth their platform breeds?
> ... going to go 4chan and hide behind "freedom of speech" to excuse the filth their platform breeds?
I mean, I don't think 4chan is doing that at all. You give the sons-of-moot too much brainpower. It's a bathroom stall's wall, policing it is silly.
That said, when will Twitter crash over the bot stuff? Wall Street doesn't think it will crash, so it won't. When Wall Street thinks it will crash, then it will. However, Twitter is at the foundation of the ad-scam that most of the Big 4 are running. When that goes belly up, Twitter will be along for the ride, but it's more of a Haunted House than a Log-ride. Scary.
I can't help but think of a story (and a company) that helps people out when they can't escape a bad Google search result. For example, you search Bill Smith and only bad things come up.
For a long time, the play book to improve your online reputation was to get rid of those bad links. However, a company (and I can't remember the name) decided it might be easier to just create better links/stories about the person and let google bury the bad stuff. Therefore when someone searches Bill Smith, they see the good stuff.
Anyways - back to the topic at hand here. Perhaps the solution to these bad bots isn't to try and stop them, but to build better bots that spread #realnews and aren't as toxic? Maybe it wouldn't work, clickbait and catchy headlines are more "share worthy" than the latest WSJ front page article, but maybe we need a new strategy?
There are entire journalism courses on the topic of objectivity. AIUI, the takeaway is that humans are inherently biased and a part of good journalism is recognizing that and minimizing its effects to provide the best coverage one can. The idea that there can be something completely objective here is in some ways naïve. You can get a good overview in the Wikipedia articles on Journalism[0] and Media Bias[1]:
As for non-toxic, I definitely think that's possible, but involves a bigger conversation as to how people communicate with each other: the same self-awareness that's necessary for good journalism is necessary when people talk with each other day-to-day. It's not clear to me how we rebuild our ability to engage each other in a non-toxic way about important and divisive topics, but I think it's necessary and well-worth trying to figure out.
I don’t think it’s toxic to apply the utmost scrutiny to outlets like NYT. There have been many events that have demonstrated their decided lack of journalistic integrity (silently editing articles after publication, overt bias, playing with release timing...).
The internet has broken mainstream media’s monopoly on “the truth” so it seems a natural next step that they are viewed objectively based on their actions instead of their perceived and heavily marketed position of authority.
Postmodernism itself is the byproduct of the Information Age giving people access to multiple viewpoints. I’m glad to see the playing field leveled.
Unfortunately, it's been leveled so much, we're below sea level.
I despise propaganda as much as... Well, I despise it.
Unfortunately, the winners of 'breaking the MSM's monopoly on Truth' aren't any better then what they replaced (And are in many ways, much worse.) Their output is, by far and large, the worst kind of yellow journalism.
Unsurprisingly, nobody actually wants the truth. The people shouting from the rooftops about corruption and bias, and collusion in the NYT turn around and uncritically read about how Hillary is in cahoots with gay alien pedophiles operating out of the back room of a pizza joint. Or, alternatively, how this is totally the week that Trump's finished. (And did you see the great burn some celebrity gave him?)
There is no "objective" news; people will always have different ideas of truth; however, i think in this case it's easy to discern a difference by the objective of the actors.
Some actors are purposely attempting to distort colloquial understanding via persuasion and misinformation, others are attempting to describe that colloquial understanding to the best of their abilities.
Our modern shortcoming is that we've just lumped all of these things under the shorthand "News" and thereby implicitly gave them all the same credibility and importance. It used to be that the Editorials and Opinions were clearly denoted and segregated in the newspaper... not anymore.
Though we may disagree on the semantics of what constitutes #realnews determining whether someone is attempting to describe the world through the prism of their own perspective or proactively trying to persuade others to think the same as them is pretty easy to see.
We can get mired in the idea that nothing is perfect, so why bother, or we can make the best of what we have & understand that it will never be perfect.
(apologies for the offtopic nature of this diatribe, this is something I've been spending a lot of time thinking about lately)
I don't know - a guy on InfoWars was talking about how Hitler is still alive (sorry can't find the link). Getting rid of conspiracy type stuff like that would be a good start.
Who decides what's a conspiracy and what's an uncomfortable truth that powerful lobbies are effectively covering up? e.g. the broadest NSA SIG INT collection efforts, sub-concussive hits from football causing CTE, and sugar-rich "low fat" diets causing rampant diabetes were all at one point in time called baseless conspiracies by experts in these fields.
Actual evidence decides that. My general rule of thumb is, how easily supporters/detractors of an 'idea' let a third-party audit/research their findings and how they react to the results.
There is a big difference between "possibly biased" and "spreading verifiable falsehoods". There is a big difference between "reporting on corruption" and "inciting violence".
This kind of false equivalence is almost as toxic as the verifiable lies being promulgated by various "media" sources.
It won't work because people want to read stories that confirm their biases.
I haven't had a lot of disagreements with liberals, but I've had a fair number of discussions with republicans/conservatives who discount anything that opposes their worldview no matter how grounded in facts or science it is, and accept anything that reinforces it regardless of how unreliable or artificial the source is.
In fact, studies have shown that being shown facts that contradict our biases reinforces our biases rather than weaken them[1].
So bots that tweet facts that contradict the fake news people want to believe makes things worse and helps no one. It feels as though the only way to help someone overcome those biases is to prevent them from experiencing reinforcing information and to expose them to situations that contradict them.
Perhaps. I heard about it in a podcast awhile ago. The store was about a women who twitted something, got on a place and during the flight her tweet went viral. I can't remember the details for the life of me, but she can't escape that incident since it's the first thing that appears when people Google her name. So the idea was to create better news about her and push those results down to page 2 (where no one goes).
But the very existence of their project raises an important question: If two volunteer data science students who are barely out of their teens can figure out how to hang out Twitter’s bad-actor bots, why doesn’t Twitter do the same?
Twitter has every incentive to lie, to minimize, to shove this under the rug. As a fairly recent IPO with virtually flat/negative user growth, and lots of fed up people (like me) abandoning the platform all together, it is desperate to squelch any negative info that Wall Street might use against it.
Unfortunately, there's no favorable outcome for Twitter shareholders in either case. Twitter fesses up about its actual percent of bots (reality is likely closer to 50 percent than the 5 percent it claims) and its numbers go down even more. Twitter continues to lie and folks like the ones in this article expose them ... not good either because every advertising dollar it's getting is "truthfully" reaching fewer actual humans.
This is a repost. What these kids are doing, is what Twitter is doing.
Do these kids not think that Twitter has the ability to do this.
This article comes off as if Twitter does not know what they are doing, which I think is hard to believe.
What I think these kids are going to find out, it's not that easy as it sounds.
Couldn't anyone do what these people are doing with a couple hundred bucks using Google cloud services and their natural langage API to label positive and negative tweets.
Seeing news like this which is not news, makes me realize how gullable people are about what goes into 'models'.
People have no idea how hard nlp is. That is all lol.
First, Twitter can't do what the students are doing - an informal, unreliable service going "hey , bot!" is fine for some random people to offer, but the site owner can't chance high error rates on public statements. Twitter keeps suggesting they're doing something like this internally, which would make sense.
Second, the article offers no real evidence that this service works. It's a machine learning problem with no proven classifications in the dataset, so the learner will at best reproduce the developers opinions of what a Russian bot looks like.
I'll play around with the tool, but my initial guess is that it just learned "patriotic, inflammatory language, retweets only" as "bot", which isn't a breakthrough anyone can actually act on.
Hey, I'm a creator of the tool written in the article.
The model looks at hundreds of features of each profile, not just a few features such as patriotic, inflammatory language, and retweets as you suggested. As a result, the model has predicted profiles with very sketchy behavior, such as accounts of normal people that were compromised and become political propaganda accounts a a few years later. We wanted to bring up these accounts and the analysis behind it so we could show the techniques behind these organizations who run these bot-like accounts. We offer our own analysis of these accounts here: https://medium.com/@robhat/an-analysis-of-propaganda-bots-on...
The Wired take definitely left me cynical; it's all too easy to write a human-interest piece about ML approaches that don't actually work. But this is a much more concrete explanation of results, and I'm intrigued.
If you don't mind, could you offer any more clarification on your test/training set? I see the Medium piece talks about wanting to avoid selection bias from hand-classification, but the Wired summary just described hand-selecting 100 "ground truth" bots and then adding their followers. How did you know the followers were also bots? And how did you try to ensure the bots you selected were a reasonable sample?
I think the "won't" may be accurate as if Twitter did it and chose not to act that would look terrible. And the scope of the bot problem may make their user base look a lot less attractive if they admitted it.
Granted everyone suspects, but there may be a valid reason they don't want to "know" at Twitter.
We agree - as independent of Twitter we have a considerable amount of freedom in how we build this. However Twitter can and should be doing much more. For example with a model they can start placing captchas before Tweets.
In addition to building the model we went about trying build our own bots. To do this we went on forums and contacted individuals selling "aged" accounts. It turns out its as simple as sending $4 over paypal to get a compromised account. These are accounts with histories, real followers, and real people behind them.
We bought 11 of these and were able to automate them within the hour of purchasing them. They also started receiving replies to their retweets and content almost immediately from all over twitter.
The ease of setting up these compromised accounts as bots was also incredibly worrisome. We've found high confidence heuristics to determine that an account has been compromised. If we can - Twitter should be able to as well.
We're ultimately a bit confused over Twitter's inactivity here. We also haven't heard anything from the company.
Your right - twitter could do (and probably are doing) everything we're doing. They have billions of dollars and hundreds of engineers.
The value of building a model is that we can do a wide analysis on bot like activity. Separately launching botcheck.me as something that users can use is incredibly valuable from the ML side. Users essentially hand classify a bunch of false positives for us (to further train on) and also give us an idea of how are model is doing.
We aren't just doing sentiment analysis and you're right - NLP is hard. Fortunately at UC Berkeley we have some amazing CS professors that have been incredibly helpful in advising us while building this.
We're using LSTMs to learn the weights of various words. We've been using high confidence heuristics to generate our training data that aren't based primarily on tweet content.
One such example is looking at compromised accounts that have had their usernames changed.
I would very much like an API that provides this service for Reddit accounts as well. I suppose since the data is freely available I need to get off my butt and write it myself though...
I'm not at all convinced; the students in the article are getting by because they have no particular fear of false positives. They're just some guys saying "yeah, looks like a bot!"
Twitter has a much harder post-discovery choice. Do they disable or publicly flag accounts, and catch flack for hitting real ones? Or do they monitor internally and wait for high confidence, then get in trouble for not doing enough?
I don't see how Twitter could offer a comparable service even with a comparable tool; being the official overseer of the question leaves them with too much responsibility.
It's also reminiscent of tracking payment fraud. Algorithmic approaches are potent, but they're not a stable fix when your problem is adversarial and open-ended - scammers have nigh-unlimited time to hunt for new angles.
As I remember it, that's a major piece of what got Palantir started - PayPal's fraud-fighting system tracked known attacks algorithmically, but they needed visualization tools to locate new attacks.
That said, there's definitely an argument for attacking the ease of creation and longevity for bot accounts. Payment scammers are an endless problem because they profit on individual wins; bot networks need some sort of protracted presence to influence people.
hey! One of said creators here! We haven't figured out how to monetize / properly fund this yet. It's more so a problem we saw and attempted to solve. The cost right now is a couple hundred dollars month in server costs.
Please don't take my comment(s) as criticism: you're doing a good job. Even if you don't make any money doing this, it's a fun thing to do. In the worst case Twitter will (acqui)hire you.
It seems that the people who are most likely to tell human from bot are the ones who would use Chrome extensions. I am glad people are working on solutions, but I'd love to see an approach that works for all users.
For example there could be a Twitter account that replies to every bot tweet and those included in the bot's tweets. Accounts that have a high likelihood of being a bot would get a reply stating that the account is likely a bot.
I'm just spitballing—certainly, this would push up on Twitter's API limitations. But it seems like there a much better way to identify fake accounts than through a Chrome extension.
One of the creators here: Yup! Our current users are more engaged and active on Twitter than the average users.
The chrome extension and website release also helps improve our model significantly. We get feedback on how it works in the wild and false positives that we can use to improve our model on.
I never used twitter before the 2016 election. I don't use twitter to communicate with people I know in real life, but I know my father and I both started using it as a place we can put on a pseudoname and talk about politics openly, without having to worry about bothering friends & family like on facebook.
Both my own and my father's accounts were classified as "exhibit patterns conducive to a political bot or highly moderated account tweeting political propaganda.", which is in a sense kinda accurate because they are solely political outlets for us. I think this is a feature, not a bug, and probably the only use case the service has provided for me.
I feel like twitter, tumblr, and other semi-anonymous networks have always been pretty political. I think the only thing that has changed is that people see twitter as news - stations actively report about what's going on on twitter - and that online political discussions are no longer dominated by liberal / social justice voices.
> If two volunteer data science students who are barely out of their teens can figure out how to hang out Twitter’s bad-actor bots, why doesn’t Twitter do the same?
Because the cost to the kids for a false positive is 0. The cost to Twitter could be company destroying lawsuits.
It's impossible to determine the error rate - there is no way of knowing for sure if an account is a bot or not. If there was, this model would be useless. I'd be interested to know how they trained it though, given they didn't have any true data to check against
That's only valid if the known bots are a random sample of all bots.
They can't test against a large array of public bots because they're only detecting political bots, not everything automated. So they'd have to train/test on "accounts which are definitely known to be bots, but trying to hide it". Meaning, presumably, the least-convincing bots or bots specific to a previously-exposed network.
Also, there are APIs for acquiring real-world mailing addresses inside of "subdivided" warehouses. The only way to verify that a person isn't a bot is to make babies with them.
Hot take: false positives don't matter. If somebody is using the service in a way that makes them indistinguishable from a propaganda bot, the platform and the discourse that it's supposed to further are better off without them.
Why only mention pro-Trump bots though? There were (and _still are_) tons of pro-HRC bots as well, synchronously posting identical text and passing it off as their own tweets. They aren’t even trying to hide it much.
What else would you expect would happen? In a town without law, it's only a matter of time until the vigilantes show up. Either Twitter is going to police its platform, or users are going to start policing it for them.
In a way, I agree with you. I despise Twitter and what people use it for. However, if being cut off from Twitter is not a big deal, why is this bot hunt worth doing?
If we assume that it's important to get rid off fake users, it must be because accessing a platform like Twitter is important.
Hillary herself is sponsoring since years a small army of geeks in full time that blast biased tweets and comments, posing themselves as normal posters instead of being identified as paid comments: http://www.motherjones.com/politics/2014/09/david-brock-hill...
This is a perversion of democracy, not even taking weight the proven federal crimes to which this former presidential candidate seems immune.
The means to bring back balance to democracy are certainly in our hands. This also includes exposing what is wrong with the upper echelons of western society.