> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.
It's pretty easy: causal reasoning. Causal, not statistic correlation only as LLM do, with or without "CoT".
Correct me if I'm wrong, I'm not sure it's so simple. LLMs are called causal models in the sense that earlier tokens "cause" later tokens, that is, later tokens are causally dependent on what the earlier tokens are.
If you mean deterministic rather than probabilistic, even Pearl-style causal models are probabilistic.
I think the author is circling around the idea that their idea of reasoning is to produce statements in a formal system: to have a set of axioms, a set of production rules, and to generate new strings/sentences/theorems using those rules. This approach is how math is formalized. It allows us to extrapolate - make new "theorems" or constructions that weren't in the "training set".
By this definition a bag of answers is causal reasoning because we previously filled the bag, which caused what we pulled. State causing a result is not causal reasoning.
You need to actually have something that deduces a result from a set of principles that form a logical conclusion or the understanding that more data is needed to make a conclusion. That is clearly different than finding a likely next token on statics alone, despite the fact the statical answer can be correct.
1) As far as I recall this program of formalizing mathematics fails unless you banish autoregression.
2) It is important to point out that a theorem in this context is not the same as a "Theorem" from mathematics. Production rules generate theorems that comply with rules and axioms of the formal system, ensuring that they could have meaning in that formal system. The meaning cannot justify the rules though, fortunately, most know to use the rules of logic so that we are not grunting beasts, incapable of conveying information.
I think the author wonders why theorems that don't seem to have meanings appear in the output of AI.
But let's say you change your mathematical expression by reducing or expanding it somehow, then, unless it's trivial, there are infinite ways to do it, and the "cause" here is the answer to the question of "why did you do that and not something else"? Brute force excluded, the cause is probably some idea, some model of the problem or a gut feeling (or desperation..) ..
Smoking increases the risk of getting cancer significantly. We say Smoking causes Cancer. Causal reasoning can be probabilistic.
LLMs are not causal reasoning because there are no facts, only tokens. For the most part you can't ask LLMs how they came to an answer, because it doesn't know.
Regardless of personal opinions about his style, Marcus has been proven correct on several fronts, including the diminishing returns of scaling laws and the lack of true reasoning (out of distribution generalizability) in LLM-type AI.
These are issues that the industry initially denied, only to (years) later acknowledge them as their "own recent discoveries" as soon as they had something new to sell (chain-of-thought approach, RL-based LLM, tbc.).
Care to explain further? He has made far more claims of the limitations of LLMs that have been proven false.
> diminishing returns of scaling laws
This was so obvious it didn't need mentioning. And what Gary really missed is that all you need are more axes to scale over and you can still make significant improvements. Think of where we are now vs 2023.
> lack of true reasoning (out of distribution generalizability) in LLM-type AI
To my understanding, this is one that he has gotten wrong. LLMs do have internal representations, exactly the kind that he predicted they didn't have.
> These are issues that the industry initially denied, only to (years) later acknowledge them
The industry denies all their limitations for hype. The academic literature has all of them listed plain as day. Gary isn't wrong because he's contradicted the hype of the tech labs, he's wrong because his short-term predictions were proven false in the literature he used to publish in. This was all in his efforts to peddle neurosymbolic architectures which were quickly replaced by tool use.
The hype is coming from startups, big tech press releases, and grifters who have a vested interest in raising a ton of money from VCs and stakeholders, same as blockchain and metaverse. The difference is that there is a large legitimate body of research underneath deep learning that has been there for many years and remains (somewhat) healthy.
I would argue that the claim of "LLMs will never be able to do this" is crazy without solid mathematical proof, and is risky even with significant empirical evidence. Unfortunately, several professionals have resorted to this language.
The AI community requires more independent experts like Marcus to maintain integrity and transparency, ensuring that the field does not succumb to hyperbole as well as shifting standards such as "internally achieved AGI", etc.
Regardless of personal opinions about his style, Marcus has been proven correct on several fronts, including the diminishing returns of scaling laws and the lack of true reasoning (out of distribution generalizability) in LLM-type AI.
These are issues that the industry initially denied, only to (years) later acknowledge them as their "own recent discoveries" as soon as they had something new to sell (chain-of-thought approach, RL-based LLM, tbc.).
Agreed, the hype cycles need vocal critics. The loudest voices talking about LLMs are the ones who financially benefit the most for it. I’m not anti-AI, I think the hype and gaslighting the entire economy to believe this is the sole thing that is going to render them unemployed is ridiculous (the economy is rough for a myriad of other reasons, most of which come originate from our countries choice in leadership)
Hopefully the innovation slowing means that all the products I use will move past trying to duck tape AI on and start working on actual features/bugs again
I have a tiny tiny podcast with a friend where we try to break down what parts of the hype are bullshit (muck) and which kernels of truth are there, if any, startedpartially as a place to scream into the void, partially to help the people who are anxious about AGI or otherwise bring harmed by the hype. I think we have a long way to go in terms of presentation (breaking down very technical terms to an audience that is used to vague-hype around "AI" is hard), but we cite our sources, maybe it'll be interesting gpr you to check out out shownotes
I personally struggle with Gary Marcus critiques because whenever they are about "making ai work" it goes into neurosymbploc "AI" which o have technical disagreements with, and I have _other_ arguments for the points he sometimes raises which I think are more rigorous, so it's difficult to be roughly in the same camp - but overall I'm happy someone with reach is calling BS ad well.
Hard disagree. The essay is a rehash of Reddit complaints, no direct results from testing and largely about product launch (simultaneous launch to 500mm+ users mind you) snafus. Please.
I think most hit pieces like this miss what is actually important about the 5 launch - it’s the first product launch in the space. We are moving on from model improvements to a concept of what a full product might look like. The things that matter about 5 are not thinking strength, although it is moderately better than o3 in my tests, which is roughly what the benchmarks say.
What’s important is that it’s faster, that it’s integrated, that it’s set up to provide incremental improvements (to say multimodal interaction, image generation and so on) without needing the branding of a new model, and I think the very largest improvement is its ability to retain context and goals over a very long set of tools uses.
Willison mentioned it’s his only daily driver now (for a largely coding based usage setup), and I would say it’s significantly better at getting a larger / longer / more context needed coding task than the prior best — Claude - or the prior best architects (o3-pro or Gemini depending). It’s also much faster than o3-pro for coding.
Anyway, saying “Reddit users who have formed parasocial relationships with 4o didn’t like this launch -> oAI is doomed” is weak analysis, and pointless.
If ChatGPT 5 lived up to the hype, literally no one would be asking for old models back. The snafus are minor as far as presentations go, but their existence completely undermines the product OpenAI is selling, which is an expert in your pocket. They showed everyone this "expert" can't even assist the creators themselves to nail such a high stakes presentation; OpenAI's embarrassing oversights foretell similar embarrassments for anyone who relies on this product for their high stakes presentation or report.
Roughly: Meteor required too much vertical integration on each part of the stack to survive the strongly changing landscape at the time. On top of that, a lot of the teams focus shifted to Apollo (which at least from a commercial point of view seems to have been a good decision).
Tight coupling to MongoDB, fragmented ecosystem / packages, and react came out soon after and kind of stole its lunch money.
It also had some pretty serious performance bottlenecks, especially when observing large tables for changes that need to be synced to subscribing clients.
I agree though, it was a great framework for its day. Auth bootstrapping in particular was absolutely painless.
non-relational, document oriented pubsub architecture based on MongoDB, good for not much more than chat apps. For toy apps (in 2012-2016) – use firebase (also for chat apps), for crud-spectrum and enterprise apps - use sql. And then React happened and consumed the entire spectrum of frontend architectures, bringing us to GraphQL, which didn't, but the hype wave left little oxygen remaining for anything else. (Even if it had, still Meteor was not better.)
I'm the defacto maintainer of the Meteor MySQL integration. Since 2015, I've been involved in the design and maintenance of six different Meteor webapps for real-time geospatial applications built for B2B and B2C.
Given this, I reject your assertion that Meteor is limited to MongoDB and "toy apps".
Local-First & Sync-Engines are the future. Here's a great filterable datatable overview of the local-first framework landscape:
https://www.localfirst.fm/landscape
My favorite so far is Triplit.dev (which can also be combined with TanStack DB); 2 more I like to explore are PowerSync and NextGraph. Also, the recent LocalFirst Conf has some great videos, currently watching the NextGraph one (https://www.youtube.com/watch?v=gaadDmZWIzE).
How is the database migration support for these tools?
Needing to support clients that don’t phone home for an extended period and therefore need to be rolled forward from a really old schema state seems like a major hassle, but maybe I’m missing something. Trying to troubleshoot one-off front end bugs for a single product user can be real a pain, I’d hate to see what it’s like when you have to factor in the state of their schema as well
I can't speak to the other tools, but we built PowerSync using a schemaless protocol under the hood, specifically for this reason. Most of the time you don't need to implement migrations at all. For example adding a new column just works, as the data is already there when the schema is rolled forward.
For me it was the lack of confirmation with the backend. When it was the next big thing, it sent changes to the backend without waiting for a response. This made the interface crazy fast but I just couldn't take the risk of the FE being out-of-sync with the backend. I hope they grew out of that model but I never took it serious for that one reason.
Yeah I built my first startup on Meteor, and the prototype for my second one, but there was so many weird state bugs after it got more complicated that we had to eventually switch back to normal patterns to scale it.
Thank you for this, I'm going to have to check out Triplit. Have you tried InstantDB? It's the one I've been most interested in trying but haven't yet.
Gladly! Automerge on its own is just a library that makes local-first data structures possible.
Ethersync uses this library for a concrete purpose: Collaborating on local text files. We wrote editor plugins and a daemon that runs on your computer, to enable you to type in plaintext files/source code together, from the editors you already know.
Perhaps, but I see it more as an endorsement of careful feature selection. Subject matter experts can do this, and once done, you can get a away with a much smaller model and better price / performance.
Looks great & kudos for making it local-first & open-source, much appreciated!
From a business perspective, and as someone looking also into the open-source model to launch tools, I'd be interested though how you expect revenue to be generated?
Is it solely relying on the audience segment that doesn't know how to hook up the API manually to use the open-source version? How do you calculate this, since pushing it via open-source/github you would think that most people exposed to it are technical enough to just run it from source.
It's pretty easy: causal reasoning. Causal, not statistic correlation only as LLM do, with or without "CoT".
reply