You’ll want to use something trusted like Ollama to run the model. The model its...

mewpmewp2 · 2025-01-28T11:13:40 1738062820

If used as an agent, given access to execute code, search web, use other form of tools, it could do potentially much more. And most productive usecases require access to such tools. If you want to automate things and get most of the modeel, you will have to give it ability to use tools.

E.g. it could have been trained to launch a delayed attack if context indicates it has access to execute code and given certain conditions, e.g. date, or other type of codeword that is input to it.

So if a malicious actor gets to a certain stage with an LLM where they are confident it will be able to reliably run this attack, all they have to do is open source it, wait for enough adoption and then use some of those methods to launch such attack. No one would be able to identify it since the weights are unreadable, but really somewhere in the weights this attack is just hiding and waiting to happen given correct pathway triggered.

pitched · 2025-01-28T11:25:36 1738063536

A delayed attack is a bit of a stretch because that’s a stateful thing but a reminder of the Ken Thompson hack does feel very relevant.

mewpmewp2 · 2025-01-28T11:38:54 1738064334

But if it's specifically trained to react to a date in its context, it seems very doable. Or to a combination of otherwise seemingly innocent words or even a statement or topic. E.g. a malicious actor could make some certain notion go viral and agentic LLMs integrated with news headlines might react to that.

It seems like it would be very arbitrary to train it to behave like this.

Most agentic systems would provide a date in the prompt context.

For simplicity sake imagine a scenario like:

1. China develops LLM that is by far ahead of its competitors. Decides to attribute it to a small start up, lets them open source it. The LLM is specifically designed to be very efficient as being an agent.

2. Agentic usage starts to get more and more popular. It's very standard to have current todays' date and major news headlines given to the context.

3. The LLM was trained to given a certain range of date and certain headlines being provided in its context to execute a pre-trained snippet of code. For example China imposing a certain type of tariff (maybe I lack imagination here, and there can be something much more subtle).

4. At that point the agentic system will attempt to fish all data it can from all sources it's being ran within.

Now maybe it's not very practical, and it's extremely risky with current state of the LLMs. I don't think it's happening right now. And China has a lot of other tech available to it already that they could do much more harm (phones, robot vacuums), but I think there's still at least potential attack vectors like this and especially if the LLM became very reliable.

sunshine-o · 2025-01-28T14:45:56 1738075556

> And China has a lot of other tech available to it already that they could do much more harm (phones, robot vacuums)

True. But here it is more about the computing power they would be able to access.

If only Bitcoin or Ethereum were stilled mined using GPU, that would be a great cryptojacking opportunity

sunshine-o · 2025-01-28T14:24:50 1738074290

Ok, but I am really curious about this and maybe my mental model is wrong:

- llama.cpp or ollama can be seen as runtime systems,

- there is no security model regarding the execution documented in both of those projects,

- of course the models are just data but so are most things that have been used as an attack vector on computers. For example your web browser or image viewer have a lot of countermeasures to protect the system from malicious image files.

I am surprised that security of operating systems, programming languages, VMs or web browsers have been a focus point forever but nobody seems to really care about security when executing those LLMs.