Hacker News new | past | comments | ask | show | jobs | submit login
Companies ground Microsoft Copilot over data governance concerns (theregister.com)
87 points by belter 5 months ago | hide | past | favorite | 53 comments



The data governance concerns are not that Microsoft has access to the data (that is no doubt covered by contract clauses).

The concerns are that most corporate implementations of network roles and permissions are not up to date or accurate, so CoPilot will show data to an employee that they should not be allowed to see. Salary info is an example.

Basically, CoPilot is following “the rules” (technical settings) but corporate IT teams have not kept the technical rules up to date with the business rules. So they need to pause CoPilot until they get their own rules straight.

Edit to add: if your employer has CoPilot turned on, maybe try asking for sensitive stuff and see what you get. ;-)



I don't think this is the core of the concern, other aspects are much worse IMHO.

For instance, the article says "Microsoft positions its Copilot tool as a way to make users more creative and productive by capturing all the human labor latent in the data used to train its AI models and reselling it.". The Copilot data could make it a lot easier to steal sensitive business procedures and intellectual property - the data would allow any third parties to fully inspect the company procedures and sensitive data in a scale that we've never seen. It would be next to impossible to manage, categorize and protect this data. It's an intellectual property nightmare.

A good version of the technology, which we don't have yet, would allow competitors to create a copy of every employee (in the business sense) and perhaps much more efficiently compete with that company.


Are you talking here about the idea that Microsoft are training their models on private customer data in a way that could later expose details of it to people outside the company?

That’s not happening. Microsoft Copilot doesn’t train on data it has access to. https://learn.microsoft.com/en-us/power-platform/faqs-copilo...


Microsoft of course having an excellent reputation for respecting the law and their customers whenever there's significant money to be made by doing the exact opposite


Your company’s data isn’t nearly as valuable for training models as you might think.

Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions if they heard any hint of data being used for training without permission.

Convincing people that you don’t train on their data remains one of the hardest problems: https://simonwillison.net/2023/Dec/14/ai-trust-crisis/


> Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions

companies already are cancelling their copilot subscriptions as it's "high cost and low value"

https://www.businessinsider.com/pharma-cio-cancelled-microso...

> Convincing people that you don’t train on their data remains one of the hardest problems:

we attempted to protect our valuable data with copyright

they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"

why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure

because the contract with a company 10000x our size says they won't? HAHAHAHAHA

the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place


Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).

(Update: actually I didn't make that point in the original post, it's from the talk version of this I gave https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide.... )


This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.


There’s significant money to be lost (lawsuits, massive reputations damage) if it was discovered they weee deliberately training on data they had guaranteed that they were not training on.


when has the threat of lawsuits or massive reputational damage ever previously stopped Microsoft?


You're confusing corporate enterprise plans with home users. Completely different ballgames.


It would help if people stopped calling it "AI" and called it what it is which is enterprise document search, a thing that has existed forever and which these new tools are just poorly controllable version of.

Would you enable a search indexer on all your corporate data that doesn't have any way to control which documents are returned to which users? Probably not.

It's a known issue with SharePoint going back years and has various solutions[0] such as document level access controls or disabling indexing of content.

If we called it what it is though the C-levels probably wouldn't even care about it. They never cared about enterprise document search before and certainly didn't "pivot" to enterprise document search or report on the progress of enterprise document search implementation to the board.

0: https://sharepointmaven.com/3-ways-prevent-documents-appeari...


This helps explain why Sharepoint search is so bad!


Good, this was a poorly conceived idea to build into an OS and turn on by default.

It is the same problem with a lot of the AI tools right now. Using them for your code, looking at your documents, etc etc. Unless you self host it or use a 'private' service from Azure or AWS (which they say is safe...) who knows where this information is ending up.

This is a major leak waiting to happen. It scares me to think what kind of data has been fed into ChatGPT or some code tool that is just sitting somewhere in a log or something plaintext that could be found later.


That's not what this is about. This is about the M365 tools that you can add to outlook/teams etc. It needs separate licensing and isn't enabled by default, you have to pay for it and assign to users/groups.


Microsoft's branding is seriously all over the place with all of this... They have at least 3 "Copilot" things now I think? Github copilot, the one built into Windows, and now this apparently. sigh.

Regardless, my other points still stand. All of these tools remain a leak waiting to happen.


It's a separate GPT instance specifically designed for corporate data per company, it not the same model the public uses. You could say Gmail is a leak waiting to happen because you're not hosting your own email server. This is a product specifically designed to work with enterprises data. The risk isn't any greater than Azure getting hacked or something.


> You could say Gmail is a leak waiting to happen because you're not hosting your own email server.

I mean, Azure has had several tenancy breaches where attackers could move from one tenant to another

> The risk isn't any greater than Azure getting hacked or something.

example: https://www.theverge.com/2021/8/27/22644161/microsoft-azure-...

they also had their master authentication keys leaked and didn't realise for 2 years

https://www.bleepingcomputer.com/news/microsoft/microsoft-st...

this one allowed the attackers to get into Microsoft executive's email accounts


Recently, one of my friends asked ChatGPT about an internal tool with a funny name and ChatGPT seemed to know the meaning behind the name & options to run the tool properly for his usecase.

We googled around to see if there was any information on the web about the tool & there’s nothing on Google which makes sense since it’s a boring internal tool for a financial services company.

Ofcourse it could be a lucky guess or it could be an intern had uploaded the manual to GPT :D


What’s the name of the tool?


It would not be impossible for a LLM to hallucinate a correct answer based on the name of a well named tool that had sane default choices in its architecture


Hey guys, this isn't the built in copilot that was scrapped. This is about corporate copilots in M365. The problem is that some companies have bad data hygiene and don't control who can access what. Copilot is working as intended but some teams have done a poor job of controlling permissions for data.


If you upload your data to a third party without first encrypting it with a key known only to you it is no longer yours.

Everything else is just wishful thinking. Like trying to keep a secret whilst only telling one or two friends.


Now imagine how much Google (Gmail) knows about us. It gets scary very quickly. Even if you are as private as you can, you will communicate with people whose email is hosted by Google.


I think Google Search is much worse than Gmail. You tell it things you wouldn't tell your family or even your lawyer or therapist. And it can infer things you didn't even know about yourself (e.g. you are pregnant or suffering a health condition).


I'm surprised we have yet to see chatgpt records as part of a court case yet, as police have been using searches as evidence for some time now.


even worse....if you use GMAIL you get fraud spam from crooks breaking into the classroom google cloud platform to phish victims....


so much truth in your statement.

I had a recent exchange with Microsoft and a group of CISOs and how it was explained to US by MS is that Copilot relies on existing file sharing security (OneDrive, Sharepoint) to determine what user could receive as feedback from Copilot. While it seems like a reasonable approach to rely on existing controls it honestly sent shivers down my spine. Anyone who had some experience securing MS platforms data sharing knows those become a total mess overtime for large organizations.


I just have to look at my ancient msn account being effectively turned unusable for several (notably: Skype, Xbox) MS services due to being stuck in some limbo between MS auth service migrations (?) to gauge my confidence in their control of user data.... And no, several hours with their support agents spread over several weeks did not resolve it.


Exactly what the article is about.


Your comment seems a bit tangential to the central point of this article which is more about poorly governed sharing permissions in SharePoint.

For what it’s worth, Microsoft does have support for customer keys at their E5 licensing level:

https://learn.microsoft.com/en-us/purview/customer-key-set-u...


This is .. not what GDPR or intellectual property law says.


Their point is that that is wishful thinking. If someone violates the law, which does happen, your data can get out. "No longer yours" in this context means "others may disseminate it without your approval". They may be penalized for doing so, but it is absolutely in their power to do so.


You're absolutely right on a legal basis, but people (and companies) act on a spectrum between "legal", "moral", and "what I can get away with".


Sure.

The law also states that crime is illegal.

I wouldn't walk around Compton late at night with a £5K camera though. Even with insurance.


What a law says and what in effect happens is not really the same. How do you GDPR a data breach? How do you DMCA a data breach?


Latest Visual Studio update has made CoPilot much more prominent. I can't tell if it's actually running or not. I can't figure out how to turn it off completely.

LLM-based AI is technically banned at my work. For somewhat good reason: most of our work involves confidential, controlled, or classified data. Though I've seen a lot of people, especially the juniors, still using ChatGPT every day.

Also noticed the UI has gotten a lot slower. I'm guessing the two things are related.

If my company wasn't locked into "Microsoft everything" this would push me the last inch to ditch VS completely. I already did at home.


The VS one is an entirely different “Copilot” to this one. This article is talking about the Copilot that’s a RAG ChatGPT that integrates with Sharepoint etc.


[flagged]


May I inquire about your private details, tiahura? Perhaps you're posting from a private beach (oh la la), or is it just a place you'd like to snorkel? Come come, spill! Give us all the linkedin/spokeo/lexisnexis tea. Let us live vivaciously through you.

Life must be good feeling unburdened by the reality of data brokers buying/selling your data! How free you must be!


The snorkeling there is terrible. Never seen a fish or any coral.

Just trying to make the point that the doomsday privacy predictions haven’t materialized, while everyday google and Ms and apple repel millions of hacking attempts - many of which would succeed against individuals trying to secure it themselves.

Not to mention the increased productivity and utility these services provide.


The doomsday already happened, it just wasn't televised. We live in a world where our TVs spy on us, our cars tattle on us to our insurance overlords, and our internet has more ads than a cyberpunk city skyline. All of our data has been breached, it's just a roll of the dice if our identity gets stolen and the all-important 'credit score' gets tanked into oblivion.

But that does not mean I should abandon my desire for privacy, tomorrow is a brand new day.


Privacy is literally a human right in many countries.


It only takes a company messing up exactly once, and the damage is catastrophic.

Everyone gets their social security number leaked...identity thieves have a field day.

Everyone gets their medical history leaked...insurance companies suddenly find another edge against the consumers.

Everyone gets their texts leaked...scammers now have blackmail against anyone who ever got spicy with their significant other.

Huge companies have been exploited before, and they will do so again and again. The only long-term winning strategy is to not let them have your data in the first place.


Nearly all SSNs have leaked by this point. The US needs a cryptography based ID system. That way each identification event is distinct, and each company gets a different (irreversible) derived ID for a person.

On that larger point I'd agree that companies should not have PII data they don't need.


You prove my point. Most of those things have happened, many times.

Everyday there's a breach, and yet the world goes on.


"the world goes on"

This reeks of an answer given by someone who simply hasn't been impacted by an of this yet - I'm sure for those who HAVE been impacted the world didn't simply "go on" it caused real stress, problems, issues for them.


And the lives of certain innocent individuals get ruined in the process. I guess it's fine as long as it's not you? "They didn't come for me..."


That’s the precious bodily fluids paranoia I’m talking about.

Nobody is coming for anyone. [Again doesn’t apply to spies and dissidents]

Systems can fail and that can mean ruined lives. However, that’s only part of the equation. There were actually anti-automobile societies in the US and Europe who opposed cars for safety reasons.


"Nobody is coming for anyone" is just wrong.

If there's an edge, people will use it. Car manufacturers share data with insurance companies[1], which can impact drivers' insurance rates or lead to coverage denial.

Do you believe the same thing will never happen in healthcare?

Do you believe that sophisticated criminals won't engage in large-scale fraud attempts? In 2021, about 23.9 million people (9% of U.S. residents age 16 or older) had been victims of identity theft during the prior 12 months.[2]

You haven't been hurt by this sort of thing, which is great for you. But millions of other people aren't so lucky.

[1] https://www.nytimes.com/2024/03/11/technology/carmakers-driv...

[2] https://bjs.ojp.gov/library/publications/victims-identity-th...


, sucking.


It's amazing how you can tell which publication this is, just by looking at title. No need even to look for the domain in brackets.


At this point we should take the idea of companies being a legal individual, but introduce the idea of a social credit system that goes towards tax deductions based on your products actually improving the lives of people rather than stealing from them.

Want to build AI tooling that leverages user data? Great! * Does it gather their data for targeted ads? - neutral. * Does it gather their data to then be resold to others? - -100points, pay more tax, you're rent seeking. * Does it help the user not get phished? - +100points you're actually offering something of value.

I don't believe having humanoid robots in factories helps or is nearly as profitable as humanoid robots that will do my laundry for me.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: