In my experience, and for use-cases that are carefully considered, language models are not confidently wrong a majority of the time. The trick is understanding the tool and using it appropriately—thus the “carefully considered” approach to identifying use-cases that can provide value.
In the very narrow fields where I have a deep understanding, LLM output is mostly garbage. It sounds plausible but doesn't stand up to scrutiny. The basics that it can regurgitate from wikipedia sound mostly fine but they are already subtly wrong as soon as they depart from stating very basic facts.
Thus I have to assume that for any topic I do not fully understand - which is the vast majority of human knowledge - it is worse than useless, it is actively misleading. I try to not even read much of what LLMs produce. I might give it some text and riff about it if I need ideas, but LLMs are categorically the wrong tool for factual content.
> In the very narrow fields where I have a deep understanding, LLM output is mostly garbage
> Thus I have to assume that for any topic I do not fully understand - which is the vast majority of human knowledge - it is worse than useless, it is actively misleading.
Why do you have to make that assumption? An expert arborist likely won’t know much about tuning GC parameters for the JVM but that won’t make them “worse than useless” or “actively misleading” when discussing other topics, and especially not when it comes to the stuff that’s relatively tangential to their domain.
I think the difference we have is that I don’t expect the models to be experts in any domain nor do I expect them to always provide factual content; the library can provide factual content—if you know how to use it right.
> You open the newspaper to an article on some subject you know well... You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them. In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.
A use-case that can be carefully considered requires more knowledge about the use-case than the LLM, it requires you to understand the specific model's training and happy paths, it requires more time to make it output the thing you want than just doing it yourself. If you don't know enough about the subject or the model, you will get confident garbage
> A use-case that can be carefully considered requires more knowledge about the use-case than the LLM
I would tend to agree with that assertion…
> it requires you to understand the specific model's training and happy paths
But I strongly disagree with that assertion; I know nothing of commercial models’ training corpus, methodology, or even their system prompts; I only know how to use them as a tool for various use-cases.
> it requires more time to make it output the thing you want than just doing it yourself.
And I strongly disagree with that one too. As long as the thing you want it to output is rooted in relatively mainstream or well-known concepts, it’s objectively much faster than you/we are; maybe it’s more expensive but it’s also crazy fast—which is the point of all tools—and the precision/accuracy of most speedy tools can be often deferred until a later step in the process.
> If you don't know enough about the subject or the model, you will get confident garbage
Once you step outside their comfort zone (their training), well, yah… they do all tend to be unduly confident in their responses—I’d argue however that it is a trait they learned from us; we really like to be confident even when we’re wrong and that trait is borne out dramatically across the internet sources on which a lot of these models were trained.