It's strange seeing so many takes like this two weeks after LLMs won gold medals at IMO and IOI. The cognitive dissonance is going to be wild when it all comes to a head in two years.
I've seen these claims, and Google even published the texts of the solutions, but it still didn't published the full log of interaction between the model and operator.
> Rather than being given questions, contestants are instead given general knowledge clues in the form of answers and they must identify the person, place, thing, or idea that the clue describes, phrasing each response in the form of a question. [0]
Doesn't sound like a test of intelligence to me, so no.
Why? Computers also won chess years ago, but they're not intelligent either? Why is winning a math competition intelligent but a trivia competition or a chess competition not intelligent?
Math and chess are similar in the sense that for humans, both require creativity, logical problem solving, etc.
But they are not at all similar for computers. Chess has a constrained small set of rules and it is pretty straightforward to make a machine that beats humans by brute force computation. Pre-Leela chess programs were just tree search, a hardcoded evaluation function, and lots of pruning heuristics. So those programs are really approaching the game in a fundamentally different way from strong humans, who rely much more on intuition and pattern-recognition rather than calculation. It just turns out the computer approach is actually better than the human one. Sort of like how a car can move faster than a human even though cars don’t do anything much like walking.
Math is not analogous: there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.
> there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.
Fundamentally opposite. Computer algorithms have been part of math research since they where invented, and mathematical proof algorithms are widespread and excellent.
The llms that are now "intelligent enough to do maths" are just trained to rephrase questions into prolog code.
I don't wish to join you in framing intelligence as a step function.
I think winning a Go or a chess competition does demonstrate intelligence. And winning a math competition does even more so.
I do not think a trivia competition like Jeopardy demonstrates intelligence much at all, however. Specifically because it reads like it's not about intelligence, but about knowledge: it tests for association and recall, not for performing complex logical transformations.
This isn't to say I consider these completely independent. Most smart people are both knowledgeable and intelligent. It's just that they are distinct dimensions in my opinion.
You wouldn't say something tastes bad because its texture feels weird in your mouth, would you?
I might even think that a symbolic chess program is in some sense more intelligent than a modern LLM. It has a concrete model of the world it operates in along with representation what it can, cannot, and is trying to, do. When LLMs get the right answer, it seems more like... highly-optimized chance, rather than coming from any sort of factual knowledge.
> I think winning a Go or a chess competition does demonstrate intelligence.
Chess is a simple alfa beta pruned minmax seaech tree. If that's intelligent then a drone flight controller or a calculator is aswell.
> association and recall, not for performing complex logical transformations.
By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.
I did not share any definitions, only vague opinions. Not that I'd know what it means for a definition to "fall apart".
And the specific bit you cite is barely even a vague opinion; it is my interpretation of the show "Jeopardy!" based on the Wiki article (I've never seen a single episode, wasn't really a thing where I'm from):
> Specifically because it reads like it's about (...) knowledge: it tests for association and recall (...)
Also:
> By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.
Yes, I did find this really quite disappointing and disillusioning when I first learned about it. A colleague of mine even straight up quit competitive chess over it.
> it is my interpretation of the show "Jeopardy!" based on the Wiki article
You are spot on though. I mostly wanted to argue that no decent distinction can be made here.
> I did find this really quite disappointing and disillusioning when I first learned about it
ye... same here.
---
I'm personally in the camp that "intelligence" is a human concept. A metric to compare humans. Applying it to computers makes us anthropomorphism computers and think of them as people. Thinking of LLMs as people makes us trust them with bad things.
So we should call them impressive, fluent, fast, useful, good at tasks. Computers already beat us at most math, statistics, searching for information, spacial visualization, information recollection, lossless communication. LLMs just adds to that list, but does nothing new to make the word "intelligent" applicable. Even if we reach the AGI singularity; thinking of them as humans or using human terminology to describe them is a fatal error.
(Destroying earth to make paperclips is arguably the least intelligent thing you could ever do.)
None of these things are enough by itself. It's rather that they have now solved so many things that the sum total has (arguably) crossed the threshold.
As for solving math problems, that is an important part of recursive self improvement. If it can come up with better algorithms and turn them into code, that will translate into raising it's own intelligence.
It isn’t an explanation, because it has causality backwards. The Trump Administration wants to do some things, and so they come up with excuses to why they should be allowed to do them. Their actions aren’t the response, they’re the initial desire.
I can't agree there's a plateau just a few weeks after two companies got gold medals at IOI and IMO using natural language (no Lean). Seems like progress is continuing nicely.
I am using the current models and they are still as useful as 6 or 12 months ago
The deal is still about the same: if you bother to do most of the hard part (thinking it through) the code generators can just about generate all the boilerplate
You either want a plan like China or an absence of planning like Texas. Either help like China or get out of the way like Texas. Two places that can actually build energy. Don't be like California where the government doesn't help while also getting in the way.
The same Texas that has statewide power outages every time it gets below freezing (despite knowing for 25+ years it’s a problem) because of their lack of regulation and central planning?
I would not entirely dismiss the way the power market works in Texas. I have not disagreement the 2021 storm should never have happened. At the same time though, I don’t believe other energy markets work very well either. I would prefer a more Texas like approach but with some thoughtfulness around capacity instead of just generation.
> I have not disagreement the 2021 storm should never have happened.
But they still haven’t fixed any of the issues. The exact same thing is going to happen again when (not if) it freezes.
> I would prefer a more Texas like approach but with some thoughtfulness around capacity instead of just generation.
Capacity isn’t the issue. Lack of winterization of pumps is the issue. Because that costs money and private companies have zero incentive to make the investment if government doesn’t force them to.
Winterization is a fix for last time’s failure, not a strategy for the future. A market like Texas can work if it values resilience alongside price efficiency, meaning capacity planning, diversified generation, and yes, some enforced standards. Otherwise you’re just running a lean system that collapses the moment reality strays from the model.
That storm was an issue for other markets as well but they were mostly able to get away with rolling blackouts due to interconnects. Those same markets and similar winterization issues but were under FERC guidelines. Folks love to anchor onto to winterization issue like it did not impact other FERC regions.
>Winterization is a fix for last time’s failure, not a strategy for the future. A market like Texas can work if it values resilience alongside price efficiency, meaning capacity planning, diversified generation, and yes, some enforced standards. Otherwise you’re just running a lean system that collapses the moment reality strays from the model.
What are you even trying to say? A private company isn't going to magically "value resilience" if there's no incentive to do so. They make MORE money when they have outages, why would they prevent that? The solution to the issue, which has worked literally everywhere else, is government regulation.
Talk about missing the forest for the trees. "If only capitalism didn't work the way it works it would be perfect".
>That storm was an issue for other markets as well but they were mostly able to get away with rolling blackouts due to interconnects. Those same markets and similar winterization issues but were under FERC guidelines. Folks love to anchor onto to winterization issue like it did not impact other FERC regions.
Citation of which other markets had blackouts due to not winterizing pumps that had been called out repeatedly after identical outages prior in 2010 and 1989? You conveniently left that out, I'm sure it was just an oversight.
Because if I had to bet money, you're talking about the power companies in other states who WERE prepared for the freeze asking homeowners to drop their thermostats a couple degrees because the cold snap was driving demand significantly higher than normal. NOT because of power plant outages due to lack of preparation and component failure - due to lack of regulation.
You’re clearly frustrated here, but let’s keep it in the realm of facts rather than snark. I didn’t “leave out sources” to hide anything, I was speaking from the same public data you can find in FERC/NERC’s joint report on the 2021 event.
SPP did in fact suffer significant generation losses, around 30% at peak, during the February 2021 storm. Causes were mixed: natural gas supply constraints, plant equipment failures, and yes, winterization gaps. Prior to that event, FERC’s winterization guidance was minimal and largely voluntary, so both SPP and ERCOT were operating without strong federal mandates.
The difference in outcomes wasn’t that SPP magically avoided the same issues, it was that SPP is interconnected with MISO and other regional grids. That allowed them to rotate outages in short windows to maintain stability, while ERCOT’s ~50% generation loss, combined with its isolation from other grids, meant load shedding had to be longer and deeper to prevent collapse.
If we’re going to critique Texas’s market, we should separate the “market structure” question from the “operational standard” question. A competitive market like ERCOT’s can work, but without binding requirements on winterization and resource adequacy, you’re just betting the grid on ideal conditions. SPP’s experience shows that interconnection alone doesn’t prevent failures, but it does give operators more options when the weather turns.
Can you drop some of the hyperbole and passive aggressiveness? You don’t even understand my position yet being quite passive aggressiveness for no reason.
I'm done with the discussion until you can provide a link to all the other states under regulation that had outages as as result of frozen pumps that had occurred multiple times over the previous 25+ years.
It's a straightforward ask that you're actively avoiding because it didn't happen and contradicts the story you're fabricating.
Not sure why you are so angry. I am trying to help you understand at least my perspective but you keep being quite aggressive for no reason.
You are framing this as if the only relevant comparison is “multiple frozen pump events over 25+ years,” but that is narrowing the scope to avoid the larger point. The February 2021 FERC/NERC report clearly documents that frozen instrumentation, valves, and pumps occurred in both regulated and unregulated markets during the same storm. SPP, MISO, and even parts of PJM experienced outages tied to equipment freezing, though the scale and duration differed because of interconnection and resource diversity.
What is different in Texas is not that freezing only happens there, but that ERCOT lost roughly 50 percent of its generation and could not import meaningful power to offset it. SPP lost about 30 percent, had similar natural gas and winterization issues, but managed to rotate outages for shorter periods because it could pull from neighboring grids.
If you want the source, the joint FERC, NERC, and regional entity “February 2021 Cold Weather Grid Operations” report is publicly available and breaks this down by region. It does not fit the claim that regulated markets never see cold-weather-driven pump or plant failures. The record shows they do, but their structure gives them more tools to manage the consequences.
My whole original point was that a more market based generation and consumption model should not be overlook but let’s go through some simple facts because I think your narrative is off track.
1) Both Ercot and SPP had winter weather failures during that storm. Pretty similar on the natural gas side, frozen wells, lack of supply, huge spikes in the spot market.
2) SPP which is federally regulated had very similar winterization voluntary guidelines in place. Post event there are now new rules in place for winter.
3) SPP was able to fair better because they used a rolling blackout to different regions. Using the interconnect they could get energy from outside their grid and create short 60min blackouts. ERCOT had no luxury because of their lack of real interconnects.
You’re more than welcome to read the review of the event from SPP. They call out well-head freeze offs, frozen cooling towers, intakes, fuel lines, etc. 50% of forced generation was a fuel supply issue.
You’re making it sound like Texas was an outlier here. It was not, SPP had the same exact issues of course with a slightly different fuel mix but they got by better with their interconnects. I don’t know why you are struggling to see that this winter event caught other grids by surprise. I am not defending Texas here but simply pointing out facts compared to your modified narrative.
> I don’t believe other energy markets work very well either.
but is isn't even about that storm, big "oh no" situations happen sooner or later (e.g. see energy outage in Spain) what is important is that you learn from it.
but more important in this argument is the general design, how can it handle flexible loads, how can it share loads between areas, how many ways to handle partial failure does it has etc.
and Texas is kinda not that good in all of that AFIK
the problem is that there are markets where politics fully getting "out of the way", doesn't work as the market dynamics favor things which might be better for the people running the gird, but are bad on a state economical level anyway (but getting in the way here is using tax money to make sure the net is stable, not getting int the way of that to protect personal investments)
it's a bit like freighttrains in many parts of the EU, there operating does in most situation make no profit. But having them is helping the economy as a whole and can (implicitly) safe the state/region etc. money. So it makes sense to place some tax money into making them still viable to operate as that investment in a roundabout way saves more money then spend.
I agree that the ability to adapt, whether to flexible loads, partial failures, or cross-area balancing, is the real test of a grid design. Texas’s isolation means it inherently lacks some of the tools SPP or MISO can use, which makes resilience harder. That is not a “market” problem so much as a structural one. ERCOT’s ruleset was built to optimize for low-cost generation in-state, not multi-region contingency planning.
Where I think a Texas-like market could work better is if you layered competitive generation with enforceable capacity and resiliency standards, along with some interconnection flexibility. Right now, the market rewards generators for selling MWh in good weather, not for being ready in bad weather. That is the economic misalignment.
The EU freight analogy works in the sense that reliability is often a public-good investment. No private actor has the incentive to overbuild or maintain resources for rare events. Texas’s approach does not have to mean politics fully getting out of the way. It could mean using market signals to keep prices efficient while still mandating the backup, winterization, and grid-sharing capabilities that the economy needs.
This just shows how you know only the talking points. The power outages are not due to lack of central planning, it's very explicitly the reverse. If Texas were hooked up to the rest of the country, those outages would not be a thing. It's the purposeful regulation that has caused those problems.
I guess you’re saying that the current status is mandated by the design of the grid. Which is true, but that status would be best described as “deregulated” rather than “purposeful regulation.”
Lack of regulation and oversite around weatherization and redundancy is the main source of our problems. The Texas’ grid is market based and so unregulated that it’s not connected to the national grid so it can avoid federal regulation.
Every single state surrounding Texas was also suffering from power outages due to the winter storm in 2021, despite all of those states being part of the non-Texas interconnections. The outages in those states weren’t as bad, but even if Texas was better connected to them, there’s no guarantee that they would have had any power to share.
So you're saying when the Texas grid fails, it's because of over overegulation. But the solution to those failures is to tap into the national grid, a grid that follows stricter FERC regulations.
> The power outages are not due to lack of central planning
It is 100% due to lack of central planning. The outages were caused by a lack of winterizarion of natural gas pumps which was a known issue in Texas but the lack of regulation meant companies could just ignore the problem. Why invest in winterizing when you can just jack up prices and make even more money when they freeze and there’s not enough power to meet demand?
There’s a reason the power doesn’t go out in the winter anywhere else in the country when it gets below freezing and it’s not “a lack of regulation”.
Winterization was a problem but it was also a problem for other regions that are part of FERC. You’re latching onto the wrong problem. FERC has updated guidelines since that storm.
- many of the most influential people are invested into Oil and similar
- the political stance had been for a long time that "there should be a fair competition" between energy sources ... while subventionieren non renewable and trying everything they can to prevent subventions for renewable
- the same Texas which once it realized solar is competitive in Texas without subventions, has been non stop looking for ways to actively hinder solar (while still subvention the non-renewable sector)
- the same Texas which is by now even internationally known to have a very brittle power grid
Why mess with Texas when it's so good at messing with itself?
There are at least 120 people, including more than 35 children, who just drowned because Texas is so unjustifiably arrogant about being messed with by experts and scientists and educators and government regulations.
I wish the modern Texas secession movements the best of luck, and hope they get exactly what they deserve, including my thoughts and prayers!
It's the same. Every assumption, taste, doc, opinion, edge case, unstated knowledge, requirement, exception to the requirement, intention, should be provided for best results. It's not crucial for throwaway weekend projects, but for harder things, it is.
I find it to be the most challenging part. There's a large amount of unstated assumptions that you take for granted, and if you don't provide them all upfront, you'll need to regenerate the code, again and again. I now invest a lot of time into writing all this down before I generate any code.
I think it's a lot more nuanced than this. Defense can be perceived as threatening and cause others to increase their defense, as per Balance of Threat theory and the security dilemma, creating a feedback loop (arms race) that leads to a lose-lose situation, primarily for the weak, but also eventually for the strong.
I am not advocating against defense spending as a category. I am saying it needs to be done skilfully and as a last resort, with the understanding that it is only coherent in a world without a unipolar security architecture, and is therefore hopefully temporary.
Without the Second Intifada, you mean. That's the moment when Israel's political left was obliterated according to domestic polling, and it hasn't yet recovered.
Morality is self-interest. Piracy isn't immoral because they aren't staked in the impacted industries. Eating meat isn't immoral because it's in their self-interest to keep doing it. What is immoral are things that would impact them, like if I stole their property, or if I take credit for their code at work, or if I ate their dog. Morality is easy to understand when you realize everyone is a hypocrite who uses their large brains to construct post hoc justifications, not unlike LLM confabulations. You can't argue against it because you'll get yet another confabulation conveniently aligned with self-interest. We don't even realize we're doing it. It's so baked into our neurology.
reply