Companies ground Microsoft Copilot over data governance concerns

snowwrestler · 2024-08-23T13:47:59 1724420879

The data governance concerns are not that Microsoft has access to the data (that is no doubt covered by contract clauses).

The concerns are that most corporate implementations of network roles and permissions are not up to date or accurate, so CoPilot will show data to an employee that they should not be allowed to see. Salary info is an example.

Basically, CoPilot is following “the rules” (technical settings) but corporate IT teams have not kept the technical rules up to date with the business rules. So they need to pause CoPilot until they get their own rules straight.

Edit to add: if your employer has CoPilot turned on, maybe try asking for sensitive stuff and see what you get. ;-)

toomuchtodo · 2024-08-23T15:19:46 1724426386

Recommend the Zenity Blackhat materials for a deep dive on the risk.

https://labs.zenity.io/p/links-materials-living-off-microsof...

https://www.youtube.com/playlist?list=PLM_RIPYi59BN6BeHyJQ_9...

gxd · 2024-08-23T14:01:28 1724421688

I don't think this is the core of the concern, other aspects are much worse IMHO.

For instance, the article says "Microsoft positions its Copilot tool as a way to make users more creative and productive by capturing all the human labor latent in the data used to train its AI models and reselling it.". The Copilot data could make it a lot easier to steal sensitive business procedures and intellectual property - the data would allow any third parties to fully inspect the company procedures and sensitive data in a scale that we've never seen. It would be next to impossible to manage, categorize and protect this data. It's an intellectual property nightmare.

A good version of the technology, which we don't have yet, would allow competitors to create a copy of every employee (in the business sense) and perhaps much more efficiently compete with that company.

simonw · 2024-08-23T14:10:19 1724422219

Are you talking here about the idea that Microsoft are training their models on private customer data in a way that could later expose details of it to people outside the company?

That’s not happening. Microsoft Copilot doesn’t train on data it has access to. https://learn.microsoft.com/en-us/power-platform/faqs-copilo...

blibble · 2024-08-23T14:18:47 1724422727

Microsoft of course having an excellent reputation for respecting the law and their customers whenever there's significant money to be made by doing the exact opposite

simonw · 2024-08-23T14:42:02 1724424122

Your company’s data isn’t nearly as valuable for training models as you might think.

Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions if they heard any hint of data being used for training without permission.

Convincing people that you don’t train on their data remains one of the hardest problems: https://simonwillison.net/2023/Dec/14/ai-trust-crisis/

blibble · 2024-08-23T15:31:43 1724427103

> Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions

companies already are cancelling their copilot subscriptions as it's "high cost and low value"

https://www.businessinsider.com/pharma-cio-cancelled-microso...

> Convincing people that you don’t train on their data remains one of the hardest problems:

we attempted to protect our valuable data with copyright

they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"

why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure

because the contract with a company 10000x our size says they won't? HAHAHAHAHA

the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place

simonw · 2024-08-23T18:24:50 1724437490

Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).

(Update: actually I didn't make that point in the original post, it's from the talk version of this I gave https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide.... )

muglug · 2024-08-23T15:09:57 1724425797

This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.

muglug · 2024-08-23T14:28:21 1724423301

There’s significant money to be lost (lawsuits, massive reputations damage) if it was discovered they weee deliberately training on data they had guaranteed that they were not training on.

blibble · 2024-08-23T15:17:28 1724426248

when has the threat of lawsuits or massive reputational damage ever previously stopped Microsoft?

bongodongobob · 2024-08-23T16:09:31 1724429371

You're confusing corporate enterprise plans with home users. Completely different ballgames.

dopylitty · 2024-08-23T14:20:18 1724422818

It would help if people stopped calling it "AI" and called it what it is which is enterprise document search, a thing that has existed forever and which these new tools are just poorly controllable version of.

Would you enable a search indexer on all your corporate data that doesn't have any way to control which documents are returned to which users? Probably not.

It's a known issue with SharePoint going back years and has various solutions[0] such as document level access controls or disabling indexing of content.

If we called it what it is though the C-levels probably wouldn't even care about it. They never cared about enterprise document search before and certainly didn't "pivot" to enterprise document search or report on the progress of enterprise document search implementation to the board.

0: https://sharepointmaven.com/3-ways-prevent-documents-appeari...

datadrivenangel · 2024-08-23T15:21:47 1724426507

This helps explain why Sharepoint search is so bad!

nerdjon · 2024-08-23T13:30:20 1724419820

Good, this was a poorly conceived idea to build into an OS and turn on by default.

It is the same problem with a lot of the AI tools right now. Using them for your code, looking at your documents, etc etc. Unless you self host it or use a 'private' service from Azure or AWS (which they say is safe...) who knows where this information is ending up.

This is a major leak waiting to happen. It scares me to think what kind of data has been fed into ChatGPT or some code tool that is just sitting somewhere in a log or something plaintext that could be found later.

bongodongobob · 2024-08-23T14:02:24 1724421744

That's not what this is about. This is about the M365 tools that you can add to outlook/teams etc. It needs separate licensing and isn't enabled by default, you have to pay for it and assign to users/groups.

nerdjon · 2024-08-23T14:08:51 1724422131

Microsoft's branding is seriously all over the place with all of this... They have at least 3 "Copilot" things now I think? Github copilot, the one built into Windows, and now this apparently. sigh.

Regardless, my other points still stand. All of these tools remain a leak waiting to happen.

bongodongobob · 2024-08-23T14:17:11 1724422631

It's a separate GPT instance specifically designed for corporate data per company, it not the same model the public uses. You could say Gmail is a leak waiting to happen because you're not hosting your own email server. This is a product specifically designed to work with enterprises data. The risk isn't any greater than Azure getting hacked or something.

blibble · 2024-08-23T16:08:29 1724429309

> You could say Gmail is a leak waiting to happen because you're not hosting your own email server.

I mean, Azure has had several tenancy breaches where attackers could move from one tenant to another

> The risk isn't any greater than Azure getting hacked or something.

example: https://www.theverge.com/2021/8/27/22644161/microsoft-azure-...

they also had their master authentication keys leaked and didn't realise for 2 years

https://www.bleepingcomputer.com/news/microsoft/microsoft-st...

this one allowed the attackers to get into Microsoft executive's email accounts

BukhariH · 2024-08-23T14:04:57 1724421897

Recently, one of my friends asked ChatGPT about an internal tool with a funny name and ChatGPT seemed to know the meaning behind the name & options to run the tool properly for his usecase.

We googled around to see if there was any information on the web about the tool & there’s nothing on Google which makes sense since it’s a boring internal tool for a financial services company.

Ofcourse it could be a lucky guess or it could be an intern had uploaded the manual to GPT :D

andsoitis · 2024-08-23T14:11:21 1724422281

What’s the name of the tool?

droopyEyelids · 2024-08-23T14:25:27 1724423127

It would not be impossible for a LLM to hallucinate a correct answer based on the name of a well named tool that had sane default choices in its architecture

bongodongobob · 2024-08-23T14:05:29 1724421929

Hey guys, this isn't the built in copilot that was scrapped. This is about corporate copilots in M365. The problem is that some companies have bad data hygiene and don't control who can access what. Copilot is working as intended but some teams have done a poor job of controlling permissions for data.

throwaway22032 · 2024-08-23T12:45:37 1724417137

If you upload your data to a third party without first encrypting it with a key known only to you it is no longer yours.

Everything else is just wishful thinking. Like trying to keep a secret whilst only telling one or two friends.

stanislavb · 2024-08-23T12:52:29 1724417549

Now imagine how much Google (Gmail) knows about us. It gets scary very quickly. Even if you are as private as you can, you will communicate with people whose email is hosted by Google.

alecco · 2024-08-23T13:05:25 1724418325

I think Google Search is much worse than Gmail. You tell it things you wouldn't tell your family or even your lawyer or therapist. And it can infer things you didn't even know about yourself (e.g. you are pregnant or suffering a health condition).

hiatus · 2024-08-23T13:23:20 1724419400

I'm surprised we have yet to see chatgpt records as part of a court case yet, as police have been using searches as evidence for some time now.

fredgrott · 2024-08-23T13:11:30 1724418690

even worse....if you use GMAIL you get fraud spam from crooks breaking into the classroom google cloud platform to phish victims....

gigantaure · 2024-08-23T13:34:23 1724420063

so much truth in your statement.

I had a recent exchange with Microsoft and a group of CISOs and how it was explained to US by MS is that Copilot relies on existing file sharing security (OneDrive, Sharepoint) to determine what user could receive as feedback from Copilot. While it seems like a reasonable approach to rely on existing controls it honestly sent shivers down my spine. Anyone who had some experience securing MS platforms data sharing knows those become a total mess overtime for large organizations.

3np · 2024-08-23T13:49:12 1724420952

I just have to look at my ancient msn account being effectively turned unusable for several (notably: Skype, Xbox) MS services due to being stuck in some limbo between MS auth service migrations (?) to gauge my confidence in their control of user data.... And no, several hours with their support agents spread over several weeks did not resolve it.

snowwrestler · 2024-08-23T13:59:08 1724421548

Exactly what the article is about.

samch · 2024-08-23T15:00:03 1724425203

Your comment seems a bit tangential to the central point of this article which is more about poorly governed sharing permissions in SharePoint.

For what it’s worth, Microsoft does have support for customer keys at their E5 licensing level:

https://learn.microsoft.com/en-us/purview/customer-key-set-u...

pjc50 · 2024-08-23T13:56:42 1724421402

This is .. not what GDPR or intellectual property law says.

Bjartr · 2024-08-23T14:06:40 1724422000

Their point is that that is wishful thinking. If someone violates the law, which does happen, your data can get out. "No longer yours" in this context means "others may disseminate it without your approval". They may be penalized for doing so, but it is absolutely in their power to do so.

jprete · 2024-08-23T14:02:15 1724421735

You're absolutely right on a legal basis, but people (and companies) act on a spectrum between "legal", "moral", and "what I can get away with".

throwaway22032 · 2024-08-23T15:36:54 1724427414

Sure.

The law also states that crime is illegal.

I wouldn't walk around Compton late at night with a £5K camera though. Even with insurance.

hobs · 2024-08-23T14:04:07 1724421847

What a law says and what in effect happens is not really the same. How do you GDPR a data breach? How do you DMCA a data breach?

moron4hire · 2024-08-23T13:43:09 1724420589

Latest Visual Studio update has made CoPilot much more prominent. I can't tell if it's actually running or not. I can't figure out how to turn it off completely.

LLM-based AI is technically banned at my work. For somewhat good reason: most of our work involves confidential, controlled, or classified data. Though I've seen a lot of people, especially the juniors, still using ChatGPT every day.

Also noticed the UI has gotten a lot slower. I'm guessing the two things are related.

If my company wasn't locked into "Microsoft everything" this would push me the last inch to ditch VS completely. I already did at home.

simonw · 2024-08-23T14:12:19 1724422339

The VS one is an entirely different “Copilot” to this one. This article is talking about the Copilot that’s a RAG ChatGPT that integrates with Sharepoint etc.

tiahura · 2024-08-23T13:25:40 1724419540

[flagged]

notanastronaut · 2024-08-23T13:58:59 1724421539

May I inquire about your private details, tiahura? Perhaps you're posting from a private beach (oh la la), or is it just a place you'd like to snorkel? Come come, spill! Give us all the linkedin/spokeo/lexisnexis tea. Let us live vivaciously through you.

Life must be good feeling unburdened by the reality of data brokers buying/selling your data! How free you must be!

tiahura · 2024-08-23T14:37:13 1724423833

The snorkeling there is terrible. Never seen a fish or any coral.

Just trying to make the point that the doomsday privacy predictions haven’t materialized, while everyday google and Ms and apple repel millions of hacking attempts - many of which would succeed against individuals trying to secure it themselves.

Not to mention the increased productivity and utility these services provide.

notanastronaut · 2024-08-23T14:42:15 1724424135

The doomsday already happened, it just wasn't televised. We live in a world where our TVs spy on us, our cars tattle on us to our insurance overlords, and our internet has more ads than a cyberpunk city skyline. All of our data has been breached, it's just a roll of the dice if our identity gets stolen and the all-important 'credit score' gets tanked into oblivion.

But that does not mean I should abandon my desire for privacy, tomorrow is a brand new day.

organsnyder · 2024-08-23T13:34:06 1724420046

Privacy is literally a human right in many countries.

Proziam · 2024-08-23T13:31:15 1724419875

It only takes a company messing up exactly once, and the damage is catastrophic.

Everyone gets their social security number leaked...identity thieves have a field day.

Everyone gets their medical history leaked...insurance companies suddenly find another edge against the consumers.

Everyone gets their texts leaked...scammers now have blackmail against anyone who ever got spicy with their significant other.

Huge companies have been exploited before, and they will do so again and again. The only long-term winning strategy is to not let them have your data in the first place.

paulryanrogers · 2024-08-23T13:36:42 1724420202

Nearly all SSNs have leaked by this point. The US needs a cryptography based ID system. That way each identification event is distinct, and each company gets a different (irreversible) derived ID for a person.

On that larger point I'd agree that companies should not have PII data they don't need.

tiahura · 2024-08-23T13:48:55 1724420935

You prove my point. Most of those things have happened, many times.

Everyday there's a breach, and yet the world goes on.

monkeywork · 2024-08-23T14:00:51 1724421651

"the world goes on"

This reeks of an answer given by someone who simply hasn't been impacted by an of this yet - I'm sure for those who HAVE been impacted the world didn't simply "go on" it caused real stress, problems, issues for them.

3np · 2024-08-23T13:52:02 1724421122

And the lives of certain innocent individuals get ruined in the process. I guess it's fine as long as it's not you? "They didn't come for me..."

tiahura · 2024-08-23T14:47:04 1724424424

That’s the precious bodily fluids paranoia I’m talking about.

Nobody is coming for anyone. [Again doesn’t apply to spies and dissidents]

Systems can fail and that can mean ruined lives. However, that’s only part of the equation. There were actually anti-automobile societies in the US and Europe who opposed cars for safety reasons.

Proziam · 2024-08-24T05:46:09 1724478369

"Nobody is coming for anyone" is just wrong.

If there's an edge, people will use it. Car manufacturers share data with insurance companies[1], which can impact drivers' insurance rates or lead to coverage denial.

Do you believe the same thing will never happen in healthcare?

Do you believe that sophisticated criminals won't engage in large-scale fraud attempts? In 2021, about 23.9 million people (9% of U.S. residents age 16 or older) had been victims of identity theft during the prior 12 months.[2]

You haven't been hurt by this sort of thing, which is great for you. But millions of other people aren't so lucky.

[1] https://www.nytimes.com/2024/03/11/technology/carmakers-driv...

[2] https://bjs.ojp.gov/library/publications/victims-identity-th...

card_zero · 2024-08-23T14:04:41 1724421881

, sucking.

TiredOfLife · 2024-08-23T13:57:45 1724421465

It's amazing how you can tell which publication this is, just by looking at title. No need even to look for the domain in brackets.

ActionHank · 2024-08-23T13:32:25 1724419945

At this point we should take the idea of companies being a legal individual, but introduce the idea of a social credit system that goes towards tax deductions based on your products actually improving the lives of people rather than stealing from them.

Want to build AI tooling that leverages user data? Great! * Does it gather their data for targeted ads? - neutral. * Does it gather their data to then be resold to others? - -100points, pay more tax, you're rent seeking. * Does it help the user not get phished? - +100points you're actually offering something of value.

I don't believe having humanoid robots in factories helps or is nearly as profitable as humanoid robots that will do my laundry for me.