I feel it's impossible for me to trust LLMs can reason when I don't know enough ...

Terr_ · 2025-02-10T19:36:39 1739216199

> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)

Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

parliament32 · 2025-02-10T22:07:05 1739225225

This is a great explanation of a point I've been trying to make for a while, when talking to friends about LLMs, but haven't been able to put quite so succinctly. LLMs are text generators, no more, no less. That has all sorts of useful applications! But (OAI and friends) marketing departments are so eager to push the Intelligence part of AI that it's become straight-up snakeoil.. there is no intelligence to be found, and there never will be as long as we stay the course on transformers-based models (and, as far as I know, nobody has tried to go back to the drawing board yet). Actual, real AI will probably come one day, but nobody is working on it yet, and it probably won't even be called "AI" at that point because the term has been poisoned by the current trends. IMO there's no way to correct the course on the current set of AI/LLM products.

I find the current products incredibly helpful in a variety of domains: creating writing in particular, editing my written work, as an interface to web searches (Gemini, in particular, is a rockstar assistant for helping with research), etc etc. But I know perfectly well there's no intelligence behind the curtain, it's really just a text generator.

AlienRobot · 2025-02-10T22:27:23 1739226443

>one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

That sounds like an interesting application of the technology! So you could for example train an LLM on piano songs, and if someone played a few notes it would autocomplete with the probable next notes, for example?

>The underlying algorithm is itself reductive in a way that can create problems

I wonder if in the future we'll see some refinement of this. The only experience I have with AI is limited to trying Stable Diffusion, but SD does have many options you can try to configure like number of steps, samplers, CFG, etc. I don't know exactly what each of these settings do, and I bet most people who use it don't either, but at least the setting is there.

If hallucinations are intrinsic of LLMs perhaps the way forward isn't trying to get rid of them to create the perfect answer machine/"oracle" but just figure out a way to make use of them. It feels to me that the randomness of AI could help a lot with creative processes, brainstorming, etc., and for that purpose it needs some configurability. For example, Youtube rolled out an AI-based tool for Youtubers that generates titles/thumbnails of videos for them to make. Presumably, it's biased toward successful titles. The thumbnails feel pretty unnecessary, though, since you wouldn't want to use the obvious AI thumbnails.

I hear a lot of people say AI is a new industry with a lot of potential when they mean it will become AGI eventually, but these things make me feel like its potential isn't to become the an oracle but to become something completely different instead that nobody is thinking about because they're so focused on creating the oracle.

Thanks for the reply, by the way. Very informative. :)

wruza · 2025-02-10T18:03:46 1739210626

it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

B:

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.

AlienRobot · 2025-02-10T18:56:26 1739213786

Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.

wruza · 2025-02-10T20:45:40 1739220340

When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.

mhast · 2025-02-11T13:34:50 1739280890

If you want to get a better understanding of this I recommend playing around in the "chat playgrounds" on some of the engines.

The Google one allows for some free use before you have to pay for tokens. (Usually you can buy $5 worth of tokens as a minimum and that will give you more than you can use up with manual requests.)

https://aistudio.google.com/prompts/new_chat

This UI allows you to alter the system prompt (which is usually hidden from the user on eg ChatGPT) and change to different models and change parameters. And then you give it the chat input similar as any other site.

You can also install a program like "LM Studio" and that will allow you to download models (through the UI) and run locally on your own machine. This gives you a similar interface to what you see in the Google AI Studio but you run it locally. And with downloaded models. (The model you download is the actual LLM which is basically very large amount of parameters you combine with the input tokens to get the next token the system outputs.)

For a more fundamental introduction to what all these systems do there are a number of Computerphile videos which are quite informative. Unfortunately I can't find a good playlist of them all but here's one of the early ones. (Robert Miles is in many of them.) https://www.youtube.com/watch?v=rURRYI66E54