The data governance concerns are not that Microsoft has access to the data (that is no doubt covered by contract clauses).
The concerns are that most corporate implementations of network roles and permissions are not up to date or accurate, so CoPilot will show data to an employee that they should not be allowed to see. Salary info is an example.
Basically, CoPilot is following “the rules” (technical settings) but corporate IT teams have not kept the technical rules up to date with the business rules. So they need to pause CoPilot until they get their own rules straight.
Edit to add: if your employer has CoPilot turned on, maybe try asking for sensitive stuff and see what you get. ;-)
I don't think this is the core of the concern, other aspects are much worse IMHO.
For instance, the article says "Microsoft positions its Copilot tool as a way to make users more creative and productive by capturing all the human labor latent in the data used to train its AI models and reselling it.". The Copilot data could make it a lot easier to steal sensitive business procedures and intellectual property - the data would allow any third parties to fully inspect the company procedures and sensitive data in a scale that we've never seen. It would be next to impossible to manage, categorize and protect this data. It's an intellectual property nightmare.
A good version of the technology, which we don't have yet, would allow competitors to create a copy of every employee (in the business sense) and perhaps much more efficiently compete with that company.
Are you talking here about the idea that Microsoft are training their models on private customer data in a way that could later expose details of it to people outside the company?
Microsoft of course having an excellent reputation for respecting the law and their customers whenever there's significant money to be made by doing the exact opposite
Your company’s data isn’t nearly
as valuable for training models as you might think.
Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions if they heard any hint of data being used for training without permission.
> Convincing people that you don’t train on their data remains one of the hardest problems:
we attempted to protect our valuable data with copyright
they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"
why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure
because the contract with a company 10000x our size says they won't? HAHAHAHAHA
the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place
Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).
This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.
There’s significant money to be lost (lawsuits, massive reputations damage) if it was discovered they weee deliberately training on data they had guaranteed that they were not training on.
It would help if people stopped calling it "AI" and called it what it is which is enterprise document search, a thing that has existed forever and which these new tools are just poorly controllable version of.
Would you enable a search indexer on all your corporate data that doesn't have any way to control which documents are returned to which users? Probably not.
It's a known issue with SharePoint going back years and has various solutions[0] such as document level access controls or disabling indexing of content.
If we called it what it is though the C-levels probably wouldn't even care about it. They never cared about enterprise document search before and certainly didn't "pivot" to enterprise document search or report on the progress of enterprise document search implementation to the board.
Good, this was a poorly conceived idea to build into an OS and turn on by default.
It is the same problem with a lot of the AI tools right now. Using them for your code, looking at your documents, etc etc. Unless you self host it or use a 'private' service from Azure or AWS (which they say is safe...) who knows where this information is ending up.
This is a major leak waiting to happen. It scares me to think what kind of data has been fed into ChatGPT or some code tool that is just sitting somewhere in a log or something plaintext that could be found later.
That's not what this is about. This is about the M365 tools that you can add to outlook/teams etc. It needs separate licensing and isn't enabled by default, you have to pay for it and assign to users/groups.
Microsoft's branding is seriously all over the place with all of this... They have at least 3 "Copilot" things now I think? Github copilot, the one built into Windows, and now this apparently. sigh.
Regardless, my other points still stand. All of these tools remain a leak waiting to happen.
It's a separate GPT instance specifically designed for corporate data per company, it not the same model the public uses. You could say Gmail is a leak waiting to happen because you're not hosting your own email server. This is a product specifically designed to work with enterprises data. The risk isn't any greater than Azure getting hacked or something.
Recently, one of my friends asked ChatGPT about an internal tool with a funny name and ChatGPT seemed to know the meaning behind the name & options to run the tool properly for his usecase.
We googled around to see if there was any information on the web about the tool & there’s nothing on Google which makes sense since it’s a boring internal tool for a financial services company.
Ofcourse it could be a lucky guess or it could be an intern had uploaded the manual to GPT :D
It would not be impossible for a LLM to hallucinate a correct answer based on the name of a well named tool that had sane default choices in its architecture
Hey guys, this isn't the built in copilot that was scrapped. This is about corporate copilots in M365. The problem is that some companies have bad data hygiene and don't control who can access what. Copilot is working as intended but some teams have done a poor job of controlling permissions for data.
Now imagine how much Google (Gmail) knows about us. It gets scary very quickly. Even if you are as private as you can, you will communicate with people whose email is hosted by Google.
I think Google Search is much worse than Gmail. You tell it things you wouldn't tell your family or even your lawyer or therapist. And it can infer things you didn't even know about yourself (e.g. you are pregnant or suffering a health condition).
I had a recent exchange with Microsoft and a group of CISOs and how it was explained to US by MS is that Copilot relies on existing file sharing security (OneDrive, Sharepoint) to determine what user could receive as feedback from Copilot. While it seems like a reasonable approach to rely on existing controls it honestly sent shivers down my spine. Anyone who had some experience securing MS platforms data sharing knows those become a total mess overtime for large organizations.
I just have to look at my ancient msn account being effectively turned unusable for several (notably: Skype, Xbox) MS services due to being stuck in some limbo between MS auth service migrations (?) to gauge my confidence in their control of user data.... And no, several hours with their support agents spread over several weeks did not resolve it.
Their point is that that is wishful thinking. If someone violates the law, which does happen, your data can get out. "No longer yours" in this context means "others may disseminate it without your approval". They may be penalized for doing so, but it is absolutely in their power to do so.
Latest Visual Studio update has made CoPilot much more prominent. I can't tell if it's actually running or not. I can't figure out how to turn it off completely.
LLM-based AI is technically banned at my work. For somewhat good reason: most of our work involves confidential, controlled, or classified data. Though I've seen a lot of people, especially the juniors, still using ChatGPT every day.
Also noticed the UI has gotten a lot slower. I'm guessing the two things are related.
If my company wasn't locked into "Microsoft everything" this would push me the last inch to ditch VS completely. I already did at home.
The VS one is an entirely different “Copilot” to this one. This article is talking about the Copilot that’s a RAG ChatGPT that integrates with Sharepoint etc.
May I inquire about your private details, tiahura? Perhaps you're posting from a private beach (oh la la), or is it just a place you'd like to snorkel? Come come, spill! Give us all the linkedin/spokeo/lexisnexis tea. Let us live vivaciously through you.
Life must be good feeling unburdened by the reality of data brokers buying/selling your data! How free you must be!
The snorkeling there is terrible. Never seen a fish or any coral.
Just trying to make the point that the doomsday privacy predictions haven’t materialized, while everyday google and Ms and apple repel millions of hacking attempts - many of which would succeed against individuals trying to secure it themselves.
Not to mention the increased productivity and utility these services provide.
The doomsday already happened, it just wasn't televised. We live in a world where our TVs spy on us, our cars tattle on us to our insurance overlords, and our internet has more ads than a cyberpunk city skyline. All of our data has been breached, it's just a roll of the dice if our identity gets stolen and the all-important 'credit score' gets tanked into oblivion.
But that does not mean I should abandon my desire for privacy, tomorrow is a brand new day.
It only takes a company messing up exactly once, and the damage is catastrophic.
Everyone gets their social security number leaked...identity thieves have a field day.
Everyone gets their medical history leaked...insurance companies suddenly find another edge against the consumers.
Everyone gets their texts leaked...scammers now have blackmail against anyone who ever got spicy with their significant other.
Huge companies have been exploited before, and they will do so again and again. The only long-term winning strategy is to not let them have your data in the first place.
Nearly all SSNs have leaked by this point. The US needs a cryptography based ID system. That way each identification event is distinct, and each company gets a different (irreversible) derived ID for a person.
On that larger point I'd agree that companies should not have PII data they don't need.
This reeks of an answer given by someone who simply hasn't been impacted by an of this yet - I'm sure for those who HAVE been impacted the world didn't simply "go on" it caused real stress, problems, issues for them.
That’s the precious bodily fluids paranoia I’m talking about.
Nobody is coming for anyone. [Again doesn’t apply to spies and dissidents]
Systems can fail and that can mean ruined lives. However, that’s only part of the equation. There were actually anti-automobile societies in the US and Europe who opposed cars for safety reasons.
If there's an edge, people will use it. Car manufacturers share data with insurance companies[1], which can impact drivers' insurance rates or lead to coverage denial.
Do you believe the same thing will never happen in healthcare?
Do you believe that sophisticated criminals won't engage in large-scale fraud attempts? In 2021, about 23.9 million people (9% of U.S. residents age 16 or older) had been victims of identity theft during the prior 12 months.[2]
You haven't been hurt by this sort of thing, which is great for you. But millions of other people aren't so lucky.
At this point we should take the idea of companies being a legal individual, but introduce the idea of a social credit system that goes towards tax deductions based on your products actually improving the lives of people rather than stealing from them.
Want to build AI tooling that leverages user data? Great!
* Does it gather their data for targeted ads? - neutral.
* Does it gather their data to then be resold to others? - -100points, pay more tax, you're rent seeking.
* Does it help the user not get phished? - +100points you're actually offering something of value.
I don't believe having humanoid robots in factories helps or is nearly as profitable as humanoid robots that will do my laundry for me.
The concerns are that most corporate implementations of network roles and permissions are not up to date or accurate, so CoPilot will show data to an employee that they should not be allowed to see. Salary info is an example.
Basically, CoPilot is following “the rules” (technical settings) but corporate IT teams have not kept the technical rules up to date with the business rules. So they need to pause CoPilot until they get their own rules straight.
Edit to add: if your employer has CoPilot turned on, maybe try asking for sensitive stuff and see what you get. ;-)