Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The system prompt may be a bit too simple, especially when using gpt-4o-mini as the base LLM that doesn't adhere to prompts well.

> You write ffmpeg commands based on the description from the user. You should only respond with a command line command for ffmpeg, never any additional text. All responses should be a single line without any line breaks.

I recently tried to get Claude 3.5 Sonnet to solve an FFmpeg problem (write a command to output 5 equally-time-spaced frames from a video) with some aggressive prompt engineering and while it seems internally consistent, I went down a rabbit hole trying to figure out why it didn't output anything, as the LLMs assume integer frames-per-second which is definitely not the case in the real world!



I asked your question across multiple LLMs and had them reviewed by multiple LLMs. DeepSeek Chat said Claude 3.5 Sonnet produced an invalid command. Here is my chat.

https://beta.gitsense.com/?chats=197c53ab-86e9-43d3-92dd-df8...

Scroll to the bottom on the left window to see that Claude acknowledges that the command that DeepSeek produced was accurate. In the right window, you'll find the conversation I had with DeepSeek chat about all the commands.

I then asked all the models again if the DeepSeek generated command was correct and they all said no. And when I asked them to compare all the "correct" commands, Sonnet and DeepSeek said Sonnet was the accurate one:

https://beta.gitsense.com//?chat=47183567-c1a6-4ad5-babb-9bb...

That command did not work but I got the impression that DeepSeek could probably get me a working solution, so after telling it the errors I keep getting, it got to a point where it could write a bash script for me to get 5 equally spaced frames.

I guess the long story short is, changing the prompt probably won't be enough and you will need to constantly shop around to see which LLM will most likely give the correct response based on the question you are asking.


So that last one is a hallucination: there's no `n_frames` variable for the select filter: https://ffmpeg.org/ffmpeg-filters.html#select_002c-aselect

At the least, I learnt a lot about how FFmpeg works.


Yeah, it is crazy how confidently LLMs can say something when it has never existed. Having said that, I'm still a HUGE fan of LLMs, as I know it is very unlikely that multiple LLMs will brain fart at the same time. If you know how to navigate things, you will get a solution much faster than you probably would have in the past.


As a user it feels that you get cosy with stuff they know and you gain a lot of time until you hit something they don't and you lose more time than the sum you gain from the beginning because finally you have to learn everything and more to be able to understand how the LLM put you on the wrong track.

The black swan for LLM in a sense.


That is why I always go in with a mistrust mindset and why I am building my chat app this way. If accuracy is important and if I am unfamiliar with something, I mainly use LLMs as a compass and rely on them to tell me when another LLM (including itself) is wrong. I'm pretty sure I will learn the wrong things over time, but these wrong things in my mind are not critical.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: