The system prompt may be a bit too simple, especially when using gpt-4o-mini as ...

sdesol · 2025-01-19T01:17:58 1737249478

I asked your question across multiple LLMs and had them reviewed by multiple LLMs. DeepSeek Chat said Claude 3.5 Sonnet produced an invalid command. Here is my chat.

https://beta.gitsense.com/?chats=197c53ab-86e9-43d3-92dd-df8...

Scroll to the bottom on the left window to see that Claude acknowledges that the command that DeepSeek produced was accurate. In the right window, you'll find the conversation I had with DeepSeek chat about all the commands.

I then asked all the models again if the DeepSeek generated command was correct and they all said no. And when I asked them to compare all the "correct" commands, Sonnet and DeepSeek said Sonnet was the accurate one:

https://beta.gitsense.com//?chat=47183567-c1a6-4ad5-babb-9bb...

That command did not work but I got the impression that DeepSeek could probably get me a working solution, so after telling it the errors I keep getting, it got to a point where it could write a bash script for me to get 5 equally spaced frames.

I guess the long story short is, changing the prompt probably won't be enough and you will need to constantly shop around to see which LLM will most likely give the correct response based on the question you are asking.

minimaxir · 2025-01-19T04:36:34 1737261394

So that last one is a hallucination: there's no `n_frames` variable for the select filter: https://ffmpeg.org/ffmpeg-filters.html#select_002c-aselect

At the least, I learnt a lot about how FFmpeg works.

sdesol · 2025-01-19T05:23:14 1737264194

Yeah, it is crazy how confidently LLMs can say something when it has never existed. Having said that, I'm still a HUGE fan of LLMs, as I know it is very unlikely that multiple LLMs will brain fart at the same time. If you know how to navigate things, you will get a solution much faster than you probably would have in the past.

gbin · 2025-01-19T10:54:24 1737284064

As a user it feels that you get cosy with stuff they know and you gain a lot of time until you hit something they don't and you lose more time than the sum you gain from the beginning because finally you have to learn everything and more to be able to understand how the LLM put you on the wrong track.

The black swan for LLM in a sense.

sdesol · 2025-01-19T17:32:26 1737307946

That is why I always go in with a mistrust mindset and why I am building my chat app this way. If accuracy is important and if I am unfamiliar with something, I mainly use LLMs as a compass and rely on them to tell me when another LLM (including itself) is wrong. I'm pretty sure I will learn the wrong things over time, but these wrong things in my mind are not critical.