The issue here (which is almost always the case with prompt injection attacks) is that an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability.
THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.
For example, any agent that accesses an issue created by an untrusted party should be considered "poisoned" by attacker-controlled data. If it then accesses any private information, its internet access capability should be severely restricted or disabled altogether until context is cleared.
In this model, you don't need per-repo tokens. As long as the "cardinal rule" is followed, no security issue is possible.
Sadly, it seems like MCP doesn't provide the tools needed to ensure this.
Genuine question - can we even make a convincing argument for security over convenience to two generations of programmers who grew up on corporate breach after corporate breach with just about zero tangible economic or legal consequences to the parties at fault? Presidential pardons for about a million a pop [1]?
What’s the cassus belli to this younger crop of executives that will be leading the next generation of AI startups?
As ethical hackers and for the love of technology, yes we can make a convincing argument for security over convenience. Don't look too much in to it I say; there will always be people convincing talent to do and make things and disregard security and protocol.
Those younger flocks of execs will have been mentored and answer to others. Their fiduciary duty is to share-holders and the business' bottom line.
Us, as technology enthusiasts should design, create, and launch things with security in mind.
Don't focus on the tomfoolery and corruption, focus on the love for the craft.
> The "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.
Then don't give it your API keys? Surely there's better ways to solve this (like an MCP API gateway)?
> an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability
> THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session
I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.
> I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.
This scenario involves a system whose responsibility is to react to an event, analyse your private info in response to the event, and output something.
The exploit is that, much like a SQL injection, it turns out attackers can inject their own commands into the input event.
Also, it's worth keeping in mind that prompts do lead LLMs to update their context. Data ex filtration is a danger, but so is having an attacker silently manipulating the LLM's context.
Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM. An attacker has no way to perform an attack, as no data they control can ever flow into the LLM, so they can't order it to behave in the way they want.
Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.
So is attacker controlled data + exfiltration (with no private data access), as then there's nothing to exfiltrate.
This is just for the "data leakage attack." Other classes of LLM-powered attacks are possible, like asking the LLM to perform dangerous actions on your behalf, and they need their own security models.
> Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.
An attacker could modify your private data, delete it, inject prompts into it, etc.
> Private data + data exfiltration (with no attacker-controlled data) is fine
Because LLMs are not at all known for their hallucinations and misuse of tools - not like it could leak all your data to random places just because it decided that was the best course of action.
Like I get the value proposition of LLMs but we're still benchmarking these things by counting Rs in strawberry - if you're ready to give it unfeathered access to your repos and PC - good luck I guess.
> Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM.
This is why I said *unless you...have a very good understanding of its behavior.*
If your public-facing service is, say, a typical RBAC implementation where the end user has a role and that role has read access to some resources and not others, then by all means go for it (obviously these system can still have bugs and still need hardening, but the intended behavior is relatively easy to understand and verify).
But if your service gives read access and exfiltration capabilities to a machine learning model that is deliberately designed to have complex, open-ended, non-deterministic behavior, I don't think "it's fine" even if there's no third-party attacker-controlled prompts in the system!
I might reword:
"attacker-controlled data, sensitive information, and a data exfiltration capability"
to:
"attacker-controlled data and privileged operations (e.g. sensitive information acces+data exfiltration or ability to do operations to production system)"
... is probably a bit unfair. From what I've seen the protocol is generally neutral on the topic of security.
But the rush to AI does tend to stomp on security concerns. Can't spend a month tuning security on this MCP implementation when my competition is out now, now, now! Go go go go go! Get it out get it out get it out!
That is certainly incompatible with security.
The reason anyone cares about security though is that in general lacking it can be more expensive than taking the time and expense to secure things. There's nothing whatsoever special about MCPs in this sense. Someone's going to roll snake eyes and discover that the hard way.
Can you give me more resources to read about this? It seems like it would be very difficult to incorporate web search or anything like that in Cursor or another IDE safely.
It is. Nearly any communication with the outside world can be used to exfiltrate data. Tools that give LLMs this ability along with access to private data are basically operating on hope right now.
Web search is mostly fine, as long as you can only access pre-indexed URLs, and as long as you consider the search provider not to be in with the attacker.
It would be even better if web content was served from cache (to make side channels based on request patterns much harder to construct), but the anti-copyright-infringement crowd would probably balk at that idea.
I don't know that this is a sustainable approach. As LLMs become more capable and are able to do the functions that a real human employee is doing they will need similar access that a normal human employee would have. Clearly not all employees have access to everything, but there is clearly a need for some broader access. Maybe we should be considering human type controls. If you are going to give broader access then you need X, Y and Z to do it like it requests temporary access from a 'boss' LLM etc etc. There are clear issues with this approach but humans also have these issues too (social engineering attacks work all too well). Is there potentially a different pattern that we should be exploring now?
I feel like there needs to be a notion of "tainted" sessions that's adopted as a best practice. The moment that a tool accesses sensitive/private data, the entire chat session should be flagged, outside of the token stream, in a way that prevents all tools from being able to write any token output to any public channel - or, even, to be able to read from any public system in a way that might introduce side channel risk.
IMO companies like Palantir (setting aside for a moment the ethical quandaries of the projects they choose) get this approach right - anything with a classification level can be set to propagate that classification to any number of downstream nodes that consume its data, no matter what other inputs and LLMs might be applied along the way. Assume that every user and every input could come from quasi-adversarial sources, whether intentional or not, and plan accordingly.
GitHub should understand that the notion of a "private repo" is considered trade-secret by much of its customer base, and should build "classified data" systems by default. MCP has been such a whirlwind of hype that I feel a lot of providers with similar considerations are throwing caution to the wind, and it's something we should be aware of.
There's an extremely large number of humans, all slightly different, each vulnerable to slightly different attack patterns. All of these humans have some capability to learn from attacks they see, and avoid them in the future.
LLMs are different, as there's only a smart number of flagship models in wide use. An attack on model A at company X will usually work just as well on a completely different deployment of model A at company Y. Furthermore, each conversation with the LLM is completely separate, so hundreds of slightly different attacks can be tested until you find one that works.
If CS departments were staffed by thousands of identical human clones, each one decommissioned at the end of the workday and restored from the same checkpoint each morning, social engineering would be a lot easier. That's where we are with LLMs.
The right approach here is to adopt much more stringent security practices. Dispense with role-based access control, adopt context-based access control instead.
For example, an LLM tasked with handling a customer support request should be empowered with the permissions to handle just that request, not with all the permissions that a CS rep could ever need. It should be able to access customer details, but only for the customer that opened the case. Maybe it should even be forced to classify what kind of case it is handling, and be given a set of tools appropriate for that kind of case, permanently locking it out of other tools that would be extremely destructive in combination.
This is a pretty loaded response but I'll attempt to answer. First, it doesn't and it was never implied that generically it does. The connection I was making was that LLMs are doing more human like tasks and will likely need access similar to what people have for those tasks for the same reason people need that access. I'm making the observation that if we are going down this path, which it looks like we are, then maybe we can learn from the approaches taken with real people doing these things.
THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.
For example, any agent that accesses an issue created by an untrusted party should be considered "poisoned" by attacker-controlled data. If it then accesses any private information, its internet access capability should be severely restricted or disabled altogether until context is cleared.
In this model, you don't need per-repo tokens. As long as the "cardinal rule" is followed, no security issue is possible.
Sadly, it seems like MCP doesn't provide the tools needed to ensure this.