Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This should also include the chart on "Coding deception" [1] which is quite deceptive (50.0 is not in fact less than 47.4)

[1]: https://youtu.be/0Uu_VJeVVfo?t=1840





Both the submission and your link took me way too long to see what's the issue here.

What were they even thinking? Don't they care about this? Is their AI generating all their charts now and they don't even bother to review it?


My unjustified and unscientific opinion is that AI makes you stupid.

That's based solely on my own personal vibes after regularly using LLMs for a while. I became less willing to and capable of thinking critically and carefully.


It also scares me how good they are in appealing and social engineering. They have made me feel good about poor judgment and bad decision at least twice (which I noticed later on, still in time). New, strict system prompt and they give the opposite opinion and recommend against their previous suggestion. They are so good at arguing that they can justify almost anything and make you believe that this is what you should do unless you are among the 1% experts in the topic.

> They are so good at arguing that they can justify almost anything

This honestly just sounds like distilled intelligence. Because a huge pitfall for very intelligent people is that they're really good at convincing themselves of really bad ideas.

That but commoditized en masse to all of humanity will undoubtedly produce tragic results. What an exciting future...


> They are so good at arguing that they can justify almost anything

To sharpen the point a bit, I don't think it's genius "arguing" or logical jujitsu, but some simpler factors:

1. The experience has reached a threshold where we start to anthropomorphize the other end as a person interacting with us.

2. If there were a person, they'd be totally invested in serving you, with nearly unlimited amounts of personal time, attention, and focus given to your questions and requests.

3. The (illusory) entity is intrinsically shameless and appears ever-confident.

Taken together, we start judging the fictional character like a human, and what kind of human would burn hours of their life tirelessly responding and consoling me for no personal gain, never tiring, breaking-character, or expressing any cognitive dissonance? *gasp* They're my friend now and I should trust them. Keeping my guard up is so tiring anyway, so I'm sure anything wrong is either an honest mistake or some kind of misunderstanding on my part, right?

TLDR: It's not not mentat-intelligence or even eloquence, but rather stuff that overlaps with culty indoctrination tricks and con[fidence]-man tactics.


AI being used to completely off load thinking is a total misuse of the technology.

But at the same time that this technology can seemingly be misused and cause really psychological harm is kind of a new thing it feels like. Right? Like there are reports of AI Psychosis, don't know how real it is, but if it's real I don't know any other tool that's really produced that kind of side effect.


We can talk a lot about how a tool should be used and how best to use it correctly - and those discussions can be valuable. But we also need to step back and consider how the tool is actually being used, and the real effects we observe.

At a certain point you might need to ask what the toolmakers can do differently, rather than only blaming the users.


No. AI is a tool to make ourselves look stupid. Suggesting that it makes people stupid suggest that they are even looking at the output.

Since everyone assumes GPT hallucinated these charts, the truth must be that they're 100% pure, organic, unadulterated human fuckups.

Doesn’t matter. Either way is bad

Either way is bad. Intentionally human made and approved is worse than machine generated and not reviewed. Malicious versus sloppy.

Machine generated is worse.

How many charts will the person create, how many the machine?


> Both the submission and your link took me way too long to see what's the issue here.

Mission accomplished for them then.


It makes Apple's charts look rigorous and transparent

I mean, if your whole business is producing an endless stream of incorrect output and calling it good enough, why would you care about accuracy here? The whole ethos of the LLM evangelist, essentially, is "bad stuff is good, actually".

I pasted the image of the chart into ChatGPT-5 and prompted it with

>there seems to be a mistake in this chart ... can you find what it is?

Here is what it told me:

> Yes — the likely mistake is in the first set of bars (“Coding deception”). The pink bar for GPT-5 (with thinking) is labeled 50.0%, while the white bar for OpenAI o3 is labeled 47.4% — but visually, the white bar is drawn shorter than the pink bar, even though its percentage is slightly lower.

So they definitely should have had ChatGPT review their own slides.


>but visually, the white bar is drawn shorter than the pink bar, even though its percentage is slightly lower.

But the white bar is not shorter in the picture.


funny isn't it - makes me feel like it's kind of over-fitted to try and be logical now, so when it's trying to express a contradiction it actually can't

Does it work that well if you don’t tell it there is a mistake though?

That's the secret, you should always tell it to doubt everything and find a mistake!

But how could have they used ChatGPT-5 if they were working on the blog post announcing it?

That one is so obviously wrong that it makes me wonder if someone mislabelled the chart, but perhaps I'm being too optimistic.

Presumably it corresponds to Table 8 from this doc: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

If that’s the case, it’s mislabelled and should have read “17%” which would better the visual.


That would still be basic fail, you don't label a chart, you enter data, the pre-AGI computer program does the rest - draws the bars and slows labels that match the data

It's been fixed on the OpenAI website.

This half makes sense to me - 'deception' is an undesirable quality in an llm, so less of it is 'better/more' from their audiences perspective.

However, I can't think of a sensible way to actually translate that to a bar chart where you're comparing it to other things that don't have the same 'less is more' quality (the general fuckery with graphs not starting at 0 aside - how do you even decide '0' when the number goes up as it approaches it), and what they've done seems like total nonsense.


> 'deception' is an undesirable quality in an llm, so less of it is 'better/more' from their audiences perspective

So if that ^ is why 50.0 is lower than 47.4 ... but why is then 86.7 not lower than 9.0? Or 4.8 not lower than 2.1


Added!

clearly the error is in the number, most likely the actual value is 5.0 instead of 50.0 which matches the bar height and also the other single digit GPT-5 results for metrics on the same chart



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: