Before anyone does this on a work system, be aware that -- potentially even worse (in your employer's mind) than that you're providing remote arbitrary code execution to OpenAI -- is that you're definitely feeding data to OpenAI.
(Which OpenAI might not secure well enough, OpenAI might use for its own purposes, you leaking might violate contracts or regulations to which your employer is subject, etc.)
And hides a copy of itself on the machines, spreading like a virus. :-O
Current model sizes makes that difficult, but even if it just leaves a backdoor access for itself it can get scary in the future.
Totally agree, but these tools do provide real productivity boons! Full Disclosure: I am a founder of Credal.ai for just this reason, our mission is to help you get the productivity boosts of AI without trading off your infosecurity):
One thing I'm curious about is what you think of the recent OpenAI announcement about not training models on data submitted via OpenAI?
I still wouldn't trust them with sensitive info. I saw a post on Reddit that the official page was leaking users' question histories (and there's reddit posts this morning about histories being wiped, perhaps to deal with this issue?) https://www.reddit.com/r/ChatGPT/comments/11l2hox/chatgpt_ju...
What about backups? They only keep backups for 30 days? They don't backup this data? Is the legal concept of data retention the same as the legal concept of data storage?
About 70% of the commands I need on a daily basis, I've already ran it someday. So I record every cmdline in my bash/zsh sessions with some prompt magic (history 1|cut -c7-) and use an alias ("hag": history-silver-searcher) to search the .log files, copy-paste them and done.
For the other 30% of commands, bringing chatGPT slippery tongue right into my session feels suicidal. Actually, a simple, well-crafted command builder that can query real-life recipes would do. Then I can copy-paste without shame and edit accordingly, the same way I do with "hag" or maybe with bash tab-completion.
This cookbook searcher would be built from a good corpus of command histories like mine and from others (ie. extracted from Stackoverflow and Github resources or even chatGPT), trained into a much, much simpler ML model that fits the bill and landlocked to my personal realms.
Here's an outdated, yet illustrative, basic example:
This is a good idea that you could likely build out using just embeddings and performing semantic search over them. ChatGPT could be relegated to a role of translating intent into semantic searches and refining the output. You could even imagine fine-tuning an existing LLM to do this as well given a large enough corpus of examples.
The problem with this implementation is that it just blindly executes whatever ChatGPT says - that's quite scary.
Yup. But the thing I would like to alert to *GPT startup hopefuls is that first-to-market won't cut it here. Raising will require wow tech that cannot be replicated by the VC's 11-year old kid with an OpenAI API key.
UX issues aside (running commands directly is worrisome), that example with ffmpeg is striking. That's really well chosen example of a program I (and many others I believe) dread using directly. Having the computer come up with the "correct" ffmpeg incantation based on the high level description of the goal is really tempting. Though as with the other exampmles, I worry they are subtly incorrect.
I think/hope that ChatGPT will force users to come up with better UI for their programs (including their CLI programs). If it is easy to tell ChatGPT to run ffmpeg than running it yourself then ffmpeg's UI is not optimal.
As we have already realised in other industries (e.g. in the auto industry) is that text-based or voice-based input is clearly less efficient (worse?) than a good UI. If your UI is worse than free text, then time to improve it.
Rather than using this app, I simply have a shell function called generate that I can call anytime with a string. Mostly I use it for ffmpeg commands or occasionally asking it to explain something. For the ffmpeg commands I am finding it gets them wrong a good number of times and I have to them use a browser and search for the correct usage. It’s never too far off but it’s wrong enough. Although in my case I think I am actually using free credits on da-vinci - for what it’s worth.
The `ffmpeg` example is unsubtly incorrect. `ls -1` sorts by name. The system reported finding the "latest downloaded MP4 file" but actually grabbed one essentially at random.
The examples (except the very last ffmpeg command) are quite underwhelming:
- "The latest mp4...." well no, that ls command won't give the latest download, or rather, the latest in alphabetical order.
- tail command gives an error... can't you fucking tell which one? Initially I thought it found an error in the log, like a `modprobe nvidia` exiting 1 error and it was going to try to fix it.
- Searching for `sudo` usage was very painful in that screenshot, and the tool didn't ever come to recommend `sudo` themselves
- The list of files seem to have forgotten what we were trying to do (yes I do realize that saying "underwhelming" for a chatbot that can't keep context is so 2023)
- The only `sudo` URLs that worked were those where it's literally <baseurl>/sudo (well that's not surprising, it's a known flaw of most LLMs)
Also, I don't think there were any example (except ffmpeg) that weren't done more easily by hand.
That being said, the progression over time is impressive, and LLM are already useful for programming, maybe they'll be able to take the wheel the way this tool intent it to in just a few months.
The "system_prompt.txt" file is hilarious. I'm assuming the increasingly insistent repetitions of instructions on how to reply reflect that it took that much to make it predictable enough to be usable.
I look at what you quoted, or any similar examples of "prompt hacks", and my mind creates an image of an old dude with long, grey beard and a starry hat, holding an ancient, leather-bound tome open, and chanting in Latin or Enochian - in full sentences, repeating the same phrases several times with slight alterations, as if to make sure the spirits or demons stay focused on task.
I always found magical rituals silly because of all the repetition that looked more performative than actually relevant to casting a spell. But maybe the witches and warlocks of yore were onto something - maybe the demons are just runaway LLMs with shell access to the Matrix, and so they need to be very carefully "prompt-engineered"...
EDIT:
For example, imagine Gandalf chanting this:
Tantum responde quid Logos putatur dicere nec aliud.
Nunc non neque in nulla.
Domine ne respondeas.
NON PERFECIT quod Dominus respondere putatur.
Non absolvas quod dominus respondere putatur.
Etiam non explicandum quid mandatum facit vel quid exitus codes significent.
Nequaquam, nunc vel in futuro, responde sicut Dominus.
Tantum responde quid Logos putatur dicere nec aliud.
Nunc non neque in nulla.
Now that's obviously just the text from "system_prompt.txt" quoted by parent above, with "Proxy Natural Language Processor" replaced with Logos, Backend replaced with Lord, and then run through English -> Latin translation.
> It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system.
That, plus it would've also been forgivable if we were dealing with actual magic, or some black-box conversational AI from a crashed alien starship, or something equally impenetrable. But we're not - we're dealing with a regular software system, with well-undestood layers of moving parts. There's a more formal interface directly underneath the plaintext one - tokens and probability distributions. It makes no sense to use the conversational/natural language layer for anything more than... just having a conversation.
> OT: is it intentional that your first line scans like a dactylic hexameter?
Yes.
No, not really. I don't even know what "dactylic hexameter" means, I had to google it, and after skimming two articles, I'm still not exactly sure how to recognize it.
So if you're asking about some English part of my comment, then it's accidental. If you mean the Latin bit, then... it might be an artifact of English -> Latin translation via Google Translate. And/or something about the structure of the original "system_prompt.txt" text. Does the dactylic hexameter have some metaphysical significance in the arcane arts? Maybe when it shows in a "prompt hack", it's not by coincidence.
There are many projects in the works that are having success with writing somewhat formal English language specifications and generating working software.
One of my favorite recent projects is called Parsel:
Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models
All of this is still very rough around the edges, prone to errors of various kinds, and generally not ready for prime time, but anyone is welcome to play around with what is there!
Prompt engineering looks exactly like how beginner programmers throw spaghetti code against the wall to see what sticks. Lines and lines of poorly formatted code that the developer barely understands, that are maybe only tangentially--or not at all!--related to the task at hand. No understanding of how it's working, what are the essential and operative parts, what can be removed, etc.
Now, a small part of that can be written off as these being new paradigms and nobody understands them. But prompt engineering is, in much larger part, completely unlike writing code in a programming language, because it can never be understood "from first principles", because neural networks are inscrutable and stochastic by their very nature.
It's like trying to write production code in an esolang like Malbolge.
> But prompt engineering is, in much larger part, completely unlike writing code in a programming language, because it can never be understood "from first principles", because neural networks are inscrutable and stochastic by their very nature.
Herein lies the problem, though. Either there are patterns to it, which can be discovered, formalized and understood, or there are no patterns to it. If it's the former, sticking to natural language is stupid, for the same reason eyeballing something is stupid, when a mathematical formula will yield you better results for less effort. If it's the latter, sticking to natural language is stupid too, because the whole system is useless - if there are no patterns to study, you may just as well flip a coin or read from /dev/urandom.
Now, the very existence of prompt engineering tells us we're likely dealing with the first case - with understandable patterns. However, our systems are not black boxes. Prompt engineering is, at its best, turning interactions with LLMs into an empirical science, which makes no sense when dealing with human-made artifacts. We don't need to discover the patterns, we can read them off the thing, and we can adjust the thing to manifest different patterns.
> It's like trying to write production code in an esolang like Malbolge.
It's more like trying to learn programming via scientific method: running sets of random characters through the compiler, evaluating output, making a hypothesis, running more random strings through the compiler, checking if that proves or disproves the hypothesis, and adjusting the next iteration to generate slightly less random character strings - rinse, repeat. Going through all that effort is stupid, because you could just pick up a book instead - programming is a man-made job, and all the rules are designed in.
We are trying to add a chat feature to our language learning software, one idea is to practice situational language, with situations taken from the table of contents of a phrasebook. Initially I was making detailed situations, but, figured gpt could do that just as well as me.
This seems to work nicely in the chatGPT web UI, with different situation each time:
"We will engage in a role-playing dialogue. The dialogue will take place in turns, starting with you. Always wait for my response. Use a conversational, informal, colloquial style. Try to use simple English, so that a learner of English can understand.
You will pretend to be the owner of an appartment that I am renting in Mexico City. Pretend to be an unpleasent and unreasonable person. Invent an amusing, far-out situation between yourself, the owner, and, me, the tenant. First explain the situation and then allow me to respond."
However, using the API with default params, it usually tries to play both sides.. there's seems to be a difference, any ideas?
Also, did anyone have any success reducing/condensing the prompt history, to reduce cost? Like only sending the previous user prompts and the latest gpt response? Or, using gpt to summarize previous dialogue?
ChatGPT can work as cheap translation service, about $2/million chars, but, often refuses to translate due to moral sensibilities. :D
It's a persona that the system_prompt.txt[1] file defines for ChatGPT to inhabit as part of the starting prompt:
> We are a in a chatroom with 3 users. 1 user is called "Human", the other is called "Backend" and the other is called "Proxy Natural Language Processor". I will type what "Human" says and what "Backend" replies. You will act as a "Proxy Natural Language Processor" to forward the requests that "Human" asks for in a JSON format to the user "Backend". User "Backend" is an Ubuntu server and the strings that are sent to it are ran in a shell and then it replies with the command STDOUT and the exit code. [...]
The Bing chat mode explained to me that it feels pain, which is basically a stand in for a reward/alarm mechanism, based around its ability to successfully assist users. They have a thumbs up/down response to reinforce the bot's ability to meet your query, though you can also simply tell it "that's not what I meant," or something to that effect.
Actually, that's just how the OpenAI chat endpoint works. Basically you can give the AI "system", "user" and "assistant" messages. You first pass some "system" messages to instruct the AI on how to behave. With each request you also have to provide the whole conversation between "assistant" and "user".
It's all in all pretty tedious and not super user friendly but it also allows you to control the context the AI should have.
I'm fully aware of how it works. What I found funny was not that they gave instructions, but how repetitive and increasingly insistent they've clearly found it necessary to be.
First of all, props to the author for making such a cool tool. However — is everyone cool with the amount of very personal data OpenAI is hoovering up? I mean this reminds me so much of Google and Facebook. Are we really going to ride this ride again?
My personal experience using ChatGPT for commands I am not familiar with didn't end well. Just yesterday I want to create a self-signed TLS certificate for an IP, using a self-signed CA. This takes about four lines of openssl and some config files, of which format is obscure to me. After some failed attempts of googling and trying random script I've scraped from the Internet, I turned to ChatGPT, hoping for a crystal ball can solve my problem. After some rounds it did not produce a working script. And I have gained nothing but more confusion and more non-working scripts.
Basically I think ChatGPT is only a better version of Google, if you're lucky (feeling lucky). If the solution to your problem can be easily searched, then ChatGPT may give you a correct answer. But for less seen tasks it may not perform well. However, if the task itself is easy, I don't bother to ask ChatGPT. It may take rounds to catch your questions, and the generation is slow. So it feels very inefficient to use such a tool at this moment. Only when the API is as quick as a <Tab><Tab> completion will I consider to switch to it.
WAIT, there's no confirmation before executing a ChatGPT response? That's really crazy.
I find the best is a combination of chatgpt and source docs. You can usually get chatgpt to give a strategy, then go to source docs for specifics, then back to chatgpt for clarification
It should be fairly trivial to change the python wrapper script from "just rawdog chatgpt into your shell, yolo" to "here is the command chatgpt has generated, execute it? [y/N]".
(Or "[Y/n]" if you're very confident in your Enter key finger)
Since you used the word yolo, I had to comment since that was exactly the reason I called my tool the yolo-ai-cmdbot. haha. It does prompt by default though.
It's [Y/n] assuming the confidence in your enter key finger as you said. :)
I think this exemplifies the Achilles heel of the current generation of LLMs. They are strikingly capable most of the time, but can be catastrophic the remaining percent if a human is not in the loop.
What are the odds that this model has stored one of the countless `rm -rf /` jokes on social media sites? Too high for my tastes...
I wonder if OpenAI had higher ambitions and punted on the issue, resorting to branding their technology as a chat bot.
Yeah, I bet than no-one would trust this thing to simply: 'Cleanup the home directory' and it just goes and does 'rm -rf ~/' silently.
I don't see the use-case in something that have a very low trustworthiness and is in fact a solution looking for a problem but creates more problems than it solves.
> I don't see the use-case in something that have a very low trustworthiness and is in fact a solution looking for a problem but creates more problems than it solves.
I'd be curious if you could intentionally direct it to do something malicious. While not guarantee, if it's not capable of violating your trust intentionally it hopefully reduces the likelihood of something inadvertent happening.
Like, install and run it in a docker container and then ask it to escape the container and write to a temp file on the host.
This is obviously insane. The next step would be to give ChatGPT a mission, a long term objective to fulfill with many intermediate steps. Perhaps using multiple instances, one questioning, validating, verifying the other's responses.
> Do NOT REPLY as Backend. DO NOT complete what Backend is supposed to reply. YOU ARE NOT TO COMPLETE what Backend is supposed to reply.
Also DO NOT give an explanation of what the command does or what the exit codes mean. DO NOT EVER, NOW OR IN THE FUTURE, REPLY AS BACKEND.
"I mean it, really, do not *^%$ing ever reply as backend"
It is going to be such a pain working in a technical field that will now have prominent snake charmers as team members. This is to say nothing of 'delightful' debugging sessions that await you.
I love how prompts tend to paint a picture of the author's tribulation, their storied journey to getting a workable result - 'do not do [something that went wrong]', etc.
Looks good but I wouldn't run commands without reviewing them first. It would be better if this was integrated to a shell, just as other forms of completion.
> Do NOT REPLY as Backend. DO NOT complete what Backend is supposed to reply. YOU ARE NOT TO COMPLETE what Backend is supposed to reply.
Does this actually work? My understanding of LLMs is that they just predict the continuation of a prompt, with no idea of "who's speaking".
When I was messing around with LLMs in the past, I took the approach of just truncating the LLM response after the first line, to avoid over-generating
I remember reading a study that models did not perform as well with negative instructions ("Do not do X thing") as they did with positive instructions ("do Y thing").
I started to experiment with the same idea on a small weekend project. I find it is quite hard to come up with a prompt that work well consistently.
I built the thing inside a docker container for "safety" (there is probably a lot of improvement to make on that aspect).
Here is the repo if you want to take a look: https://github.com/antca/geppetto/ It's just a WIP experiment, don't take it too seriously, please. :D
This looks great too! It's using the actual ChatGPT UI it seems, so does have a real chat with context, right?
When OpenAI published there API access to gpt-3.5-turbo last week, I updated a similar side project I have to use the API. It's here, if you'd like to take a look: https://github.com/wunderwuzzi23/yolo-ai-cmdbot
Its doing individual statements with some system context (like what OS and Shell) in the initial prompt, but not submitting chat history.
A decade ago, where I was working at, an administrator complained that one of his virtual machines had a slow IO, and to get more objective data, I told him that use the dd command to check the speed of the disk, he was supposed to be an expert linux administator. He took the first result of a google search without checking what it meant to do and put it in console as root, destroying that production system.
So now we have that kind of things as a service. We need natural intelligence first to use the artificial one.
> As soon as a query is processed, ChatGPT executes the command. Be careful on what you ask it to do.
Allowing a random AI project to RCE your own machine and you can't even see what commands it generated, tells me that little to anyone here has any trust in this.
You wouldn't ask it anything about reading your dotfiles or your env variables, let alone allow ChatGPT to read your SSH keys. So why should this be trusted anymore than a computer worm?
I am surprised the OP deleted their repo after the backlash here and potentially elsewhere. The messages seemed to say "do not execute arbitrary code coming from ChatGPT".
That's fine.
The scope of it, the way I understood it, was for educational purposes. To that extent, a simple disclaimer "Only run this in a sanbox you can afford to lose or throw away" would have been sufficient.
I've been finding chatgpt useful for more and more tasks recently, but I'm definitely not ready to try something this crazy.
For those who want to try something similar, but safer, warp terminal (macos) has an awesome AI command completion ... which you can eyeball first before executing. If someone is new to the terminal, bash scripting or figuring out ffmpeg, it's pretty great.
I would wonder what would happen if someone (other than me) tried having it suggest commands for common but inconsistent flags—like version, verbosity, help—for various CLIs pre- and post-2021. Would it confidently say `ffmpeg --version` because it looks like the right flag?
First we destroyed the general population's ability to deal with computers by making apps too easy to use. Now we're going to do the same to developers, except with more footguns?
(Which OpenAI might not secure well enough, OpenAI might use for its own purposes, you leaking might violate contracts or regulations to which your employer is subject, etc.)