Hacker News new | past | comments | ask | show | jobs | submit | biophysboy's comments login

Bubbles are defined by delusional speculation. The beliefs are the motivation; there's no reason.

This is an incredibly vague essay. Let me be more explicit: I think this is a clear sign of a bubble. LLMs are very cool technology, but they are not the second coming. It can't do experiments; it doesn't have an imagination; it doesn't have an ethical framework; its not an agent in any human sense.

LLMs are awesome but I haven't felt significant improvement since the original GP4 (only in speed).

The reasoning models (o1 pro) don't have good reasoning capability when I'm asking things from them, so I don't expect o3 to be significantly better in practice even if they look good on the benchmarks.

Still, I think ARC-AGI benchmark is awesome, and the fact that they are targeting resoning is a good direction (I just think they need to research more techniques / theories).


I disagree.

Sonnet 3.6 (the 2022-10-22 release of Sonnet 3.5) is head and shoulders above GPT-4 and anyone who has been using both regularly can attest to this fact.

Reasoning models do reason quite well but you need the right problems to ask them. Don't throw open-ended problems at them. They perform well on problems with one (or many) correct solution(s). Code is a great example - o1 has fixed tricky code bugs for me where Sonnet and other GPT-4 class models have failed.

LLMs are leaky abstractions still - as the user, you need to know when and how to use them. This, I think, will get fixed in the 1-2 years. For now, there's no substitute for hands on time using these weird tools. But the effort is well worth it.


> one (or many) correct solution(s).

> Code is a great example

I’d argue that most coding problems have one truly correct solution and many many many half correct solutions.

I personally have not found AI coding assistance very helpful, but from blog posts by people who do much of the code I see from Claude is very barebones html templates and small scripts which call out to existing npm packages. Not really reasoning or problem solving per se.

I’m honestly curious to hear what tricky code bugs sonnet has helped you solve.

It’s led me down several incorrect paths, one of which actually burned me at work.


> LLMs are awesome but I haven't felt significant improvement since the original GP4 (only in speed).

Taking the outside view here - maybe you don't "feel" like it's getting better. But benchmarks aside, there are now plenty of anecdotal stories of scientists and mathematicians using them for actual work. Sometimes for simple labor-saving, but some stories of actually creative work that is partially/wholly based on interactions with LLMs. This is on top of many, many people using this for things like software development, and claiming that they get significant benefits out of these models.


I Use it all time time to help me with finding documentation for an unknown (to me) API/framework/app. It's saved me tons of time.

>LLMs are awesome but I haven't felt significant improvement since the original GP4 (only in speed).

Absolutely disagree. Are you using LLMs for coding? There has been a 10x (or whatever) improvement since GPT4.

I causally tracked the ability of LLMs to create a processore design in a HDL since 2023. I stopped in June of 2024, because Sonnet would basically oneshot the CPU, testbench and emulator. There are another substantional update of Sonnet in October 2024.

https://github.com/cpldcpu/LLM_HDL_Design


It’s because statistical tests are based on the distribution of the statistic, not the data itself. If the central limit holds, this distribution will be a bell curve as you say


Aye, but there are cases where it doesn't hold. Lognormal and power law distributions are awfully similar in samples, but matters on the margin.

For example, checking account balances are far from a normal distribution!


Knowledge is not always cumulative, but you have to be aware of its prevent contradictions to refute them. This is a point in Kuhn’s “Structure of scientific revolutions” that I think is neglected


Every one of Peter Thiel's opinions is a dark provocative secret. He's very predictable


I was kind of surprised he was gay


Contrary to the political priors of VCs, I think the real answers are pretty mundane:

1. Funding. Drugs have a low probability of success and a long lag time. Investors think in discount rates. A high-risk venture like biology is less appealing than an advertising-based tech platform with zero marginal costs.

2. Costs. Biology uses a LOT of proprietary instruments, kits, and chemical reagents. A lot. It also needs a lot of manual labor that would be difficult to roboticize.

3. Time. Biological experiments operate on biological timescales. Code takes seconds to run. Cell cultures take a day to grow. Even fancy new multiplexed sequencing assays take a while. You have the library prep time, the sequencing, and the downstream analysis. Its a long process. Now imagine waiting years and years to see if a drug in clinical trial prevents Alzheimer's.

4. Complexity. How do you make an equation for a giant network of weakly-interacting parts? Biology is a very "data-driven" field for this reason. The introduction of new microscopy, chemical conjugation techniques, and high-throughput assays has only made things worse. I genuinely hope some black box AI will be able to help us make sense of this mess and cure cancer. But medicine is full of interventions and incomplete prior histories, which will make naive association models hard to use.


The dramatic predictions that tend to occur between the model updates are what annoy me. AGI is coming! It’ll be smarter than PhDs! We will need technocommunism!! Cold War 2!


Regulations concerning new problems often provoke cynical reactions that rely entirely on intuition. You see it being applied here: this will backfire, this will isolate kids more, this will make parents less responsible, etc. These kinds of reactions are simplistic and cut off debate. There is a wide array of things we can do between nothing at all and a paranoid "protect-the-kids", state-run total technology shutdown.


Its not a test of AGI. It tests whether you possess innate human capacities: rudimentary arithmetic & geometry, etc. Most of the problems were created manually. The original paper states that they limited the test to innate human priors to make the scope well-defined.


I don't think he's as critical as you say. He just views LLMs as the product of intelligence rather than intelligence itself. LLM fans will say this is a false distinction, I guess.

His definition of intelligence is interesting: something that can quickly achieve tasks with few priors or experience. I also think the idea of using human "Core Knowledge" priors is a clever way to make a test.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: