I’ve noticed in my own prompt-writing that goes into code bases that it’s basically just programming, but… without any kind of consistency-checking, and with terrible refactoring tools. I find myself doing stuff like this all the time by accident.
One of many reasons I find the tech something to be avoided unless absolutely necessary.
The main trouble is if you find that a different term produces better output, and use that term a lot (potentially across multiple prompts), but don't want to change every case of it, or use a repeated pattern with some variation that and need to change them to a different pattern.
You can of course apply an LLM to these problems (what else are you going to do? Find-n-replace and regex are better than nothing, but not awesome) but there's always the risk of them mangling things in odd and hard-to-spot ways.
Templating can help, sometimes, but you may have a lot of text before you spot places you could usefully add placeholders.
Writing prompts is just a weird form of programming, and has a lot of the same problems, but is hampered in use of traditional programming tools and techniques by the language.
> & what do you feel is missing in consistency checking? wrt input vs output or something else?
Well, sort of—it does suck that the stuff's basically impossible to unit-test or to develop as units, all you can do is test entire prompts. But what I was thinking of was terminology consistency. Your editor won't red-underline if you use a synonym when you'd prefer to use the same term in all cases, like it would if you tried to use the wrong function name. It won't produce a type error if you if you've chosen a term or turn of phrase that's more ambiguous than some alternative. That kind of thing.
- (2 web_search and 1 web_fetch)
- (3 web searches and 1 web fetch)
- (5 web_search calls + web_fetch)
which makes me wonder what's on purpose, empirical, or if they just let each team add something and collect some stats after a month.