Microsoft's paper on OpenAI's GPT-4 had hidden information

knaik94 · on March 23, 2023

The comments in interpretability read like science fiction to me. There's paragraphs on DV3 explaining other models and itself and the emergent properties that appear with bigger models. There's so much commented out related to functional explainability and counterfactual generations.

"we asked DV3 for an explanation. DV3 replied that it detected sarcasm in the review, which it interpreted as a sign of negative sentiment. This was a surprising and reasonable explanation, since sarcasm is a subtle and subjective form of expression that can often elude human comprehension as well. However, it also revealed that DV3 had a more sensitive threshold for sarcasm detection than the human annotator, or than we expected -- thereby leading to the misspecification.

To verify this explanation, we needed to rewrite the review to eliminate any sarcasm and see if DV3 would revise its prediction. We asked DV3 to rewrite the review to remove sarcasm based on its explanation. When we presented this new review to DV3 in a new prompt, it correctly classified it as positive sentiment, confirming that sarcasm was the cause of the specification error."

The published paper instead says "we did not test for the ability to understand sarcasm, irony, humor, or deception, which are also related to theory of mind" .

The main conclusion I took away from this is "the remarkable emergence of what seems to be increasing functional explainability with increasing model scale". I can see the reasoning for why OpenAI decided not to publish any more details about the size or steps to reproduce their model. I assumed we would need a much bigger model to see these level of "human" understanding from LLMs. I can respect Meta, Google, and OpenAI's decision, but I hope this accelerates the research into truly open source models. Interacting with these models shouldn't be locked behind corporate doors.

PaulDavisThe1st · on March 24, 2023

> "we did not test for the ability to understand sarcasm"

I find it hard to see how detecting and eliminating sarcasm requires a theory of mind. It requires some association between various stylistic elements and the concept of sarcasm.

The same is true of irony.

I still wonder how many of these people have read Dennett's "The Intentional Stance", which holds that the best way to think about "intention" is as an explanatory model, not a mechanism. That is, we can say that the dog "behaves as if it has the intention to get inside" without making any claim about the internal state of the dog.

Dennett further speculates that our own self-experience of intention is a matter of turning the same explanatory model upon our own behavior, but that's an extension that isn't directly relevant to this speculation about language models.

PeterisP · on March 24, 2023

> I find it hard to see how detecting and eliminating sarcasm requires a theory of mind. It requires some association between various stylistic elements and the concept of sarcasm.

I am quite certain that detecting sarcasm can't be done based on stylistic elements" and requires some (even if implicit) estimation of what the author is thinking that contrasts with what is being said.

E.g. a relevant example I have actually seen in doing sentiment analysis on tweets to evaluate how customers perceive a company is "#CompanyName Got my order delivered in just under three hours. Thank you for great service! thumbsup-emoji" - now is this a positive review or sarcasm? And the thing is, you can't tell from the message by itself, you need an understanding of the customers' expectation (guess you might call it "a theory of mind") that for a pizza chain this is obviously sarcasm, but for a web store that sells some electronics the same thing would actually mean a fast delivery and great service. IMHO sarcasm detection is mostly about 'world knowledge' about what the implied expectations are, and not about stylistic elements at all.

PaulDavisThe1st · on March 24, 2023

Disagree.

If you have determined that a common element in reviews is a reference to delivery of goods or services within a given timeframe, and can identify that most of the quantitative descriptions of the timeframe are relatively short ... then coming across one that uses a much larger timeframe but is still positive will be quite noticeable.

This is actually typical of the problem with far, far too many people's interpretation of what language models are doing. The process I've described requires only a representation of language behavior. The LM can be said to understand how people talk (write) about a thing, but there is no knowledge of anything beyond language behavior.

megablast · on March 24, 2023

You are assuming it is 100% perfect at detecting sarcasm at all times. There is no reason for this to be true.

nextaccountic · on March 24, 2023

> I still wonder how many of these people have read Dennett's "The Intentional Stance", which holds that the best way to think about "intention" is as an explanatory model, not a mechanism. That is, we can say that the dog "behaves as if it has the intention to get inside" without making any claim about the internal state of the dog.

That's also a way to frame evolution and literally all of biology.

knaik94 · on March 24, 2023

The authors go into more detail about their reasoning and the nuances between mechanistic and functional explainability. The authors only said related not required. No clear reason was given for why the commented sections stayed commented out.

paganel · on March 24, 2023

> It requires some association between various stylistic elements and the concept of sarcasm.

That sounds very mechanic-ist, including Dennett's theory (I haven't read Dennett, to be exact, I'm going by your explanation). Which I guess it's par for the course when talking about a scaled up Mechanical Turk concoction like this GPT thing is. It won't "create" any Radio Erevan jokes anytime soon, that's for sure, though.

d0mine · on March 24, 2023

if it quacks like a dog, it is a dog.

Though, with LLMs, you might need quite a few (often more than 5) exchanges to find out that it can't actually quack.

meghan_rain · on March 24, 2023

Wait so they tested if it detects sarcasm, it failed, and then they were like "let's pretend this never happened" and wrote "We did not test for the ability to understand sarcasm"?

knaik94 · on March 24, 2023

I think you might have a misunderstanding. The "error" was misclassifying the sentiment of this imdb review [1], the human labeled it as positive but the LLM labeled it as negative. The researchers concluded that the model was more sensitive to sarcasm than the human reviewer.

https://www.imdb.com/review/rw1756778/

jay_kyburz · on March 24, 2023

The last sentence, the conclusion, says it's like some other move, but worse. Way worse.

There is no way a human reading this review would think its positive.

"I'll just tell you that it's slightly reminiscent of U Turn by Oliver Stone but is a way down in all artistic properties."

Oh and UTurn only scores 6.7 so its a bit of a crap movie to start with.

m3kw9 · on March 24, 2023

It’s a poorly written review and anyone reading wouldn’t find it useful. Note that 1/6 found it useful in the site

ajross · on March 24, 2023

It's not that poorly written. But regardless: it's clearly a poorly written, sarcastic, negative review. Read it a second time and you'll pick out the sarcasm for sure. The reviewer thought this was a dumb film.

The first notable point is that the model caught that (on its first read, FWIW), even though the original human doing the labeling didn't.

And the second is that the researchers discovered this, and presumably discussed it. And yet when they wrote up the paper they not only dropped the content but denied that the analysis had been done.

gleenn · on March 24, 2023

Can you please specify at least a part or two that you specifically feel are sarcastic? The author doesn't seem to like the movie but I feel like the majority of the comments are quite factual or seem like straightforward opinion.

knaik94 · on March 24, 2023

The part of the review where it says "he must have never in his life seen a flick about any small towns" could be classified as sarcastic. The reviewer also says "you should watch the movie" if you are curious about the ending, but the overall intent of the review seems to be the opposite.

hn_go_brrrrr · on March 24, 2023

I would call that acerbic. There's none of the reversal in sense I expect to see in sarcasm.

sdenton4 · on March 24, 2023

in 2023 the goalposts on agi have moved far enough that we are in the bleachers arguing about whether a model was correct to classify a movie review as sarcastic, when it may have merely been acerbic.

deadpannini · on March 24, 2023

The opening qualifies as sarcastic:

> I am so happy not to live in an American small town. Because whenever I'm shown some small town in the States it is populated with all kinds of monsters among whom flesh hungry zombies, evil aliens and sinister ghosts are most harmless.

Mocking irony, in the context of a negative review.

gleenn · on March 24, 2023

Sarcasm is saying one thing and meaning the opposite. If monsters are most harmless, then also not wanting to live there makes logical sense and isn't backwards. So that's straightforward mockery, not sarcastic.

ajross · on March 24, 2023

> Sarcasm is saying one thing and meaning the opposite.

What I find fascinating here is how generative AI has inverted all our sci-fi tropes. I mean, sure: you're right! That's the way "sarcasm" is defined in most dictionaries. But you and I both know that as the language is actually used, the term means a whole host of techniques used to convey negative emotional content in language that is not directly negative. Your (correct!) dictionary pedantry isn't interesting to me. We've been here before.

But GPT-4 wasn't trained on dictionary rules. It was trained on actual language. And it's actually better at inferring this stuff than the pedants are. Our introvert brains have trouble teasing meaning like this and have to hide behind rules and structure. The computer doesn't.

To wit: the robot is us.

gleenn · on March 27, 2023

But I'm not being a pedant, we're measuring sarcasm. Someone has to clearly define it to judge the AI. If everyone in these threads thinks sarcasm is anything "negative" then they are wrong. That would be a negative sentiment classification. You have to be clear with what you're measuring.

thaumasiotes · on March 24, 2023

It's not clear to me that it's a negative review. The movie is rated as "artistically worse than a movie by Oliver Stone". Other than that, the message appears to be that the movie is typical of its genre, which is not generally considered a bad thing.

There is another message that the reviewer doesn't like the genre, but that isn't a comment on the movie.

salad-tycoon · on March 24, 2023

There was a conflict of opinions with D3 and human annotator. The quote above at least does not indicate who was right or wrong, merely noting that the machine had a more sensitive sarcasm detector.

So I guess maybe it’s saying the human was wrong after all?

ChatGTP · on March 24, 2023

If you read the IMDB review, there is no sarcasm in it.

Maybe more like appreciation with some humour thrown in?

Are we all a bit confused and second guessing our own use and interpretation of language now ?

knaik94 · on March 24, 2023

I got confused about whether the review was positive or negative too. The part of the review where it says "he must have never in his life seen a flick about any small towns" could be classified as sarcastic. The review isn't very clear in terms of prose and conclusion. The reviewer clearly says "you should watch the movie", but after re-reading it a few times, I would consider that as sarcasm too. I can't say for sure whether the reviewer enjoyed it themselves.

ChatGTP · on March 24, 2023

Sorry I wasn't clear, I mean appreciation that he does not living in a horrible small town with monsters etc. Not exactly for the movie.

"he must have never in his life seen a flick about any small towns"

That's probably the least sarcastic sentence for me, because it's just a reinforcement of the opening statement:

"I am so happy not to live in an American small town. Because whenever I'm shown some small town in the States it is populated with all kinds of monsters among whom flesh hungry zombies, evil aliens and sinister ghosts are most harmless."

I don't think it's a "positive review" I think it's neither, it's fairly neutral, and the author kind of suggests the reader watches the movie.

Out of interest I asked someone who has no idea about ChatGPT-4 and it's apparent sarcasm detection abilities and they didn't think it was sarcastic, albeit a bit 'weird' and poorly written. Confirmation bias?

We could say more, however, there are more important things to do...

futureshock · on March 24, 2023

It sounded jaded and a little sarcastic to me. I don’t think the reviewer liked the movie much, found it all that original or entertaining.

ChatGTP · on March 24, 2023

Then it’s quite subjective then , who’s right ?

He gave it a score of 7 which IMO is pretty high?

knaik94 · on March 24, 2023

Without the context of how the reviewer rates other movies, it's not possible to say whether 7 is high or low. I would say 7 is mid or mediocre, something I wouldn't go out of my way to watch. The trend I've noticed is that most people only use the top half of the scale, anything 5 and less is bad. The funny thing, and I am guilty of this myself, is when they only use 6 through 10 but then use decimal points too.

futureshock · on March 24, 2023

I doubt the score would have been part of sentiment analysis. I ignored it and tried to make a judgement based on the text alone. It seemed more like a 5/10 review to me and mildly negative.

And yes it absolutely is subjective. That’s exactly the point and the power of these LLMs, to be able to handle the vagueness of human communication.

og_kalu · on March 24, 2023

It didn't fail ?

djmips · on March 24, 2023

In fact this is starting to feel more like an ARG than real life.

https://en.wikipedia.org/wiki/Alternate_reality_game

withinboredom · on March 23, 2023

Sigh. What an idiot (no offense). Why tell the world you got this from the comments? Now every damn researcher is going to strip them out and for those of us who knew to look for them, take away our fun.

Never. Ever. Reveal your sources.

zamnos · on March 23, 2023

It may shock you to hear, but some people go onto our Internet, and just tell lies! Preposterous, I know. But that means that only really works if you're a reporter. If you're some rando on twitter, random unverified claims on twitter are hearsay and rumor. Whats the use of some Twitter account going "Microsoft didn't know GPT-4 was multi-modal and could do images as well as text"? Or "Even Microsoft doesn't know how expensive it was to train GPT-4"? If you're seeking fame beyond a closed Slack group, you're gonna need to back up your claims.

withinboredom · on March 23, 2023

Yeah, that’s the problem. Fame seekers. If the twitterer were interested in actually sharing the info, they’d reach out to a reporter.

DANmode · on March 24, 2023

No reporter has ever run a fake or misleading story with source attributed only to "anonymous sources close to the matter" — right?

albertzeyer · on March 23, 2023

This is already known. Google and DeepMind usually strip the comments out.

gwern · on March 24, 2023

Do they? I've read DM papers with the comments in the sources... but maybe they institutionalized a stripping phase.

albertzeyer · on March 24, 2023

Maybe older ones? But I'm also not sure how consistent they are. Google and DM are big. I don't really know their policies on publishing. Maybe not every group enforces it.

deely3 · on March 23, 2023

Correct me if Im wrong, but.. are you defending closed source and restriction of information?

withinboredom · on March 23, 2023

It’s the difference between going into a town square and yelling “look everyone, I found a treasure map!” And going on a treasure hunt.

They found comments that were commented out for a reason. These commented out sections aren’t a good look for the author. Usually commented out sections are either funny, notes, or provide some extra context.

He could have attempted to share this with someone in the industry of sharing information (like a reporter) who could validate it, and ask Microsoft for a comment — who will now be on the defensive instead of (potentially) forthcoming with more context about why those sections were commented out. This is a pretty fucked way of doing this.

bastawhiz · on March 24, 2023

> and ask Microsoft for a comment

Doesn't this sort of invalidate your point? Microsoft is going to know they accidentally leaked their comments and start stripping them.

It's not possible to perform journalism here without revealing where you got the information you're reporting. Seriously, how do you write this story without making it obvious that your big scoop is the comments from the paper?

withinboredom · on March 24, 2023

The point is not to frame the comments as a mistake and give the author a fair chance.

withinboredom · on March 24, 2023

The issue is that the twitter thread makes it out that leaving comments is “amateur” or “wrong” without giving the author a fair chance to rebut them. Anyone seeing this is going to start stripping their comments so they don’t get framed this way.

derefr · on March 24, 2023

They're defending "polite open secrets" — things that are only spread by word-of-mouth among friends, because as soon as a centralized broadcast source reports them, they become so over-exploited that they cease to usefully exist.

Tostino · on March 23, 2023

That's not at all how I read that comment... more like "through a side channel, we found this out" rather than spelling out exactly where they found it.

dylan604 · on March 23, 2023

whatever you do, do NOT use View Source on a web page or you'll place yourself in legal jeopardy (in certain jurisdictions)

smcin · on March 24, 2023

? Do you only mean HTML  comment tag, or JS/CSS code, or both? Do you merely mean reading them (like you said), or copying them, which is something different? Which legal jeopardy? Citation needed.

I searched to try to decipher your comment but couldn't.

arthurcolle · on March 24, 2023

https://www.macobserver.com/news/missouri-backs-away-view-so... a recent story, there have been a couple

smcin · on March 24, 2023

But that didn't substantiate what you claimed. The journalist didn't just read the HTML (which contained 100,000 SSNs which were publicly exposed by Missouri DESE), he reported the leak, and gave them time to fix it before publishing.

Missouri Governor Mike Parson and Cole County Prosecutor Locke Thompson (an elected prosecutor, in a reelection year [0]) trying to label a bona-fide journalist as a "hacker" were bringing ridiculous charges to try to deflect from the obvious embarrassment, instead of dealing with whatever MI state agency/ies or contractor was responsible, and had never QA'ed their webpages.

Coming back to your comment, the issue was not about reading the HTML(/JS/CSS). Can you provide me a single citation where that was the issue? (Obviously, there's a separate issue about "How do you responsibly make a disclosure when you find a leak of private information in a webpage?")

[0] https://www.newstribune.com/news/2022/feb/23/thompson-files-...

DANmode · on March 24, 2023

Any examples, for sake of discussion?

Who has been hurt by this in court, where and when?

davidmurdoch · on March 24, 2023

Here's one: https://www.wired.com/story/missouri-threatens-sue-reporter-...

DANmode · on March 24, 2023

Seems he took extra steps after observation to allow for discovery of his digging.

max_expectation · on March 23, 2023

There is a tool my supervisor always used to make me use to avoid this when posting to ArXiv: https://github.com/google-research/arxiv-latex-cleaner

zamnos · on March 23, 2023

According to the Latex source, one of the original titles for this paper was "First Contact With an AGI System".

armchairhacker · on March 23, 2023

What makes GPT4 AGI that makes GPT3.5 not AGI? And what makes GPT3.5 AGI that makes GPT3 not AGI? And what makes GPT3 AGI that makes GPT2/smaller models not AGI?

What is the "hard line" that makes something AGI or not AGI? Because IMO it looks like GPT4 is somewhat AGI, but also the older models possibly all the way down to even Markov chains: it's just that this AGI is nowhere near human-level.

beambot · on March 24, 2023

What marks the hard line for "intelligent"? It's largely philosophical. Is a human? Dolphin? Mouse? Cockroach? Tree?

Thorentis · on March 24, 2023

Abstraction and reflection have been posited in the past as pre-requisties for intelligence. "I am the thinker that is thinking." How do we prove whether an AI has this capability or not? I'd say it's nearly impossible.

However, I think we can certainly prove when it doesn't. For instance, the fact we need an external plugin to get the model itself to return text claiming 1+1=2, tells me that GPT4 cannot reason about numbers in the abstract, and therefore lacks abstraction ability.

estsauver · on March 24, 2023

That's a rather interesting line to me, as someone with a young child, they cannot perform that abstract reasoning on mathematics either. At the same time, I feel extremely confident they're a thinking and intelligent being.

I think we're so strongly biased against a deeply uncomfortable reality that there may not be a hard line, we don't even want to consider the alternative.

msm_ · on March 24, 2023

That heuristics also rules out every animal that doesn't pass the mirror test, and I'm pretty sure rats and dogs are thinking and intelligent.

chaxor · on March 24, 2023

Models have trouble with discrete spaces often, partly because of the internal continuous space representations, but also in this context because the transition from the probabilities in natural language to mathematics may not be as stark as it should be.

But to make matters worse, or more muddied, 1+1=1 can be a valid mathematical statement. It simply depends upon the set and group you have, or if you're doing modular arithmetic, etc. Sometimes you're given a unital magma. So, there's still a heavy dependency on context for the problem setup, but the underlying discrete and deterministic rules that are applied to the context is less malleable than other context switches in NLP LLMs do well in (such as language styling).

esjeon · on March 24, 2023

The inability to fully define a thing doesn't invalidate all attempts to set its outline. At least, it's easy to conclude that an intelligent being has to be able to reliably perform basic reasoning (given all the necessary information is properly acquired). The current GPT models all fail at this, and neither the token length nor the network size can fix this.

beambot · on March 24, 2023

What do you mean by "reliably perform basic reasoning"?

politician · on March 24, 2023

It’s a great title for a paper, so they’re probably saving it.

pmontra · on March 24, 2023

We don't exactly know what intelligence means and if we are always intelligent.

An example about ping pong players. The pace of their movements is too fast for conscious thinking, so it's all trained reflexes with some overall strategic planning trying to keep up with events. There is no time to think about anything. Is intelligence suspended there, at least the general one? Then the same person stops playing and gives an interview about the game and full general intelligence turns on again.

bertday · on March 24, 2023

I’d say that any technical limitation that doesn’t apply to humans is not AGI. Context windows are the most apparent ones; humans don’t have a stroke after reading N characters.

SanderNL · on March 24, 2023

AGI is not human intelligence. Cats are generally intelligent for example. A cat level AGI is worth billions.

Humans certainly have context windows. Try asking your CEO about some lines of code in your work. Humans have a fairly large one, I give you that and it is fuzzy.

d0mine · on March 24, 2023

> What makes AGI?

Human level performance on general tasks (you can't be just good in 90%, you should avoid being terrible in the remaining 10%).

Though LLMs don't need to be AGI, to have a catastrophic economic impact.

singularity2001 · on March 24, 2023

there is no hard line but gpt4 solves some tests 90% correctly where 3.5 only achieved 10%

dankai · on March 24, 2023

The multi-modality of the model.

precompute · on March 24, 2023

OpenAI's Marketing department taking a L when they realize the hyperbole is a tad too strong this time

ummonk · on March 24, 2023

That would explain why GPT found the paper sarcastic…

m3kw9 · on March 24, 2023

As good Gpt seem to be, isn’t this a low bar for AGI?

moffkalast · on March 24, 2023

Well if it walks kind of like a duck, and quacks almost like a duck.. it may be a prototype robotic duck but it's still a duck.

It's pretty intelligent and rather general too, so at least by the definition that doesn't include mandatory consciousness it would mostly fit. And consciousness is pointless for a robotic system because it doesn't add anything practically useful. Just because agency has to be provided by the user doesn't make it any less of an AGI I'd say.

xnx · on March 23, 2023

I was prepared to be very amused if this was the result of Windows screenshot tool acropalypse.

taneq · on March 23, 2023

Oh wow, I hadn’t heard about this one!

Content for anyone else in the same boat: https://www.techtimes.com/articles/289318/20230321/windows-1...

layer8 · on March 24, 2023

HN discussions:

https://news.ycombinator.com/item?id=35207787

https://news.ycombinator.com/item?id=35208721

https://news.ycombinator.com/item?id=35249971

https://news.ycombinator.com/item?id=35273305

kgeist · on March 23, 2023

Interesting that it was originally called DV3 in the paper - looks similar to the name of the existing, older "davinci-003" model, which powered GPT-3

zamnos · on March 23, 2023

More interestingly, "Davinci 3" is mentioned as an author of unknown affiliation*. Which, if it's referring to them having using davinci-003 to help author the paper would be interesting. It having unknown affiliation would but a) true and b) hilarious.

* https://twitter.com/DV2559106965076/status/16387694383539404...

bombcar · on March 23, 2023

Sounds to me like the type of injoke common in early drafts of papers

salad-tycoon · on March 24, 2023

Ah yes in jokes on a big early draft. Usually it’s funny because it’s at least a bit true but incongruent with the final work desired. Funny machine helping coauthor research on its self and investigating if it’s sentient with a “theory of mind “ and etc. or not and all the rest. Starting to sound like a great sci-fi book.

AgentME · on March 23, 2023

Maybe they planned to use the davinci-003 name for it originally, but then when GPT-4 took longer to make than they expected and a new revision GPT-3 came out first, they reallocated the name to that.

_ktx2 · on March 23, 2023

Interesting that they note the power consumption and climate change impact. I believe there's a long list of folks who said this wasn't the case weeks ago.

tgsovlerkhgsel · on March 24, 2023

It's one of the tired tropes that gets brought up every time AI/ML is brought up.

Everything we do has climate change impact. Power consumption is among the ones that's easiest to get "green", and there is significant progress specifically from cloud operators (at least Google, I assume others are similar).

https://www.reddit.com/r/MachineLearning/comments/hwfjej/d_t... suggests hundreds of GPU-years for training such a model under optimal assumptions, so let's assume thousands for the new models under real assumptions.

https://images.nvidia.com/content/technologies/volta/pdf/tes... says 300 W.

3000 GPU-years at 300 W = 7.9 million kWh. This would assume that they used those GPUs and not more efficient accelerators.

https://www.eia.gov/tools/faqs/faq.php?id=74&t=11 says 0.855 pounds of CO2 emissions per kWh. That's 388 g in normal units, and in line with what other countries are reporting (Germany was 420 g/kWh). g/kWh = metric tons per million kWh. This assumes the data centers are not using "greener" than average power.

So training such a model produces ~3000 metric tons of CO2 emissions under the assumption of 3000 GPU-years (how convenient). A round trip from SFO to London in Economy is roughly 0.9 metric tons. (https://www.icao.int/environmental-protection/Carbonoffset/P... Building a house (of unfortunately unspecified size) is 15 to 100 tons. (https://climate.mit.edu/ask-mit/how-much-co2-emitted-buildin...)

So while building one of these models does have an impact, it's on the order of other common activities that benefit far fewer people. Because once trained, those models provide values to millions of users.

Going back to the flights example, training one such massive model is likely about as bad as one larger research conference, once you consider the impact of hotels etc.

IMO the demands to justify the impact are ridiculous, coming from people who are just looking for any excuse to criticize, on par with demanding that any researcher doing any research justifies the carbon footprint of their commute as part of their research paper. Thus, it's a good thing that they didn't waste time and space in their paper addressing those claims, and we only see an early, commented out section that's equivalent to "TODO: Should we address these claims that keep getting thrown?"

ChatGTP · on March 24, 2023

I understand your view point, but I guess your argument really boils down to, it's all justifiable in some way or another.

I don't disagree with you either, I live in a foreign country and fly home to see family each year, so I'm far from perfect in this regard.

I guess this proves my point though, it's all justifiable to some degree: "I have to go see my family.", "this benefits millions." and on it goes.

We have to realize that it this stage, basically nothing, including the current LLMs will survive climate change if we keep this up.

jwestbury · on March 24, 2023

> Because once trained, those models provide values to millions of users.

Do they? Those of us in tech often take the positive value of technological progress as a given, but when looking at, e.g., the example in OpenAI's paper of GPT4 tricking a human into solving a CAPTCHA, I think it's quite clear there's possible negative value as well, and claims about "value to millions of users" probably need to be substantiated.

ChatGTP · on March 24, 2023

Yup, kind of sucks but the end result is likely everyone and every actual living thing will suffer from this wave of “AI” because it’s an arms race now, let’s just keep burning fossil fuels and hope the thing spits out the answer to that problem ?

jay_kyburz · on March 24, 2023

Here is your answer to the fossil fuel problem.

https://www.youtube.com/watch?v=kpeODvGJE1Q

update: found a better one.

JCharante · on March 24, 2023

The answer already exists but people refuse to implement it.

_ktx2 · on March 24, 2023

We have nuclear, but a lot of Americans are concerned about nuclear for reasonable reasons.

We have renewables, but those can't take up the majority capacity of a grid unless we start adding massive batteries.

Then there's grid rebalancing where we incentivize people to use and store renewable energy locally, this lessening the strain on the grid but that still results in fossil fuels or nuclear.

Hydro has been found to be environmentally destructive. I'm not sure if that was just my state or if that's ubiquitous.

What were you thinking?

CatWChainsaw · on March 24, 2023

Probably just salivating over Dyson spheres, or whatever quantum woo-woo showed up a few news cycles ago about energy appearing "from nowhere"...

JCharante · on March 24, 2023

Nuclear + solar panels everywhere

_ktx2 · on March 24, 2023

Ah, so people don't just "refuse to implement it", yeah? There's been a large number of disasters with long-term consequences and the industry is claiming it has more reliable systems, which is also what they said before many of these nuclear disasters occurred.

I'm hopeful for nuclear, but they have a ways to go in proving themselves to the wider public. With that said, I think you could've positioned what you said a bit more fairly.

buildbot · on March 23, 2023

Err wow, they left all the (very weird) comments in looking at the source. Our group always makes sure to strip the comments.

jurimasa · on March 24, 2023

Most of the paper probably was written by the model itself. They just removed the hallucinations.

galaxyLogic · on March 24, 2023

> we asked DV3 for an explanation. DV3 replied that it detected sarcasm

They should have asked it "Why do you say there is sarcasm?" A human can answer that but I don't think a bot can, can it?

vaastav · on March 24, 2023

Always remember to clean up the .tex files before submitting to arxiv

polskibus · on March 24, 2023

Can anyone post the remaining hidden information from that paper in an easy to read form? Is there really some secret sauce in there ?

subsubzero · on March 24, 2023

tweet #2 in the thread was missing, wonder what happened there?

ReaLNero · on March 24, 2023

https://threadreaderapp.com/thread/1638769434763608064.html

Alifatisk · on March 24, 2023

> we were worried about the unknown alignment procedures that OpenAI had taken to reduce the harmfulness of this powerful AI model

Is the user referring to the ai alignment? the elitists who wants to nerf all the ai research?

ChickenNugger · on March 24, 2023

I mean does it mean it's accurate information? Could be a repurposed copy of some other document. Maybe they wanted it to be written by DV-3 and it didn't pan out, but they continued using the draft document anyway.

I know from personal experience that I've had draft documents that were WILDLY wrong before I published to anyone but myself. Whole sections I just went back and completely deleted. In fact my senior project paper (LaTeX) in college had a whole section with big ASCII bull taking a shit on a paragraph because it was some work I'd done that didn't pan out at all. I left it in the source because I found it funny. lol, I found it: https://i.imgur.com/6Oj64AV.png

This was before I'd ever heard of a VCS system. Subversion 1.0 was released 6 months after I graduated, it turns out. So commented out code and multiple copies was all I had.

psychphysic · on March 23, 2023

Every so often, very rarely, I end up wanting to read some twitter content.

And I realise how agonisingly painful twitter threads are to consume.

It's just as bad as those YOU WONT BELIEVE WHERE THOSE CELEBS ARE NOW. Where you had to click next one by one.

calf · on March 23, 2023

Drives me nuts that scientists use Twitter x/n style writing. From Richard Feynman to Edward Tufte, we were told that PowerPoint talks are bad for science. And now Twitter writing style is uncritically accepted.

switch007 · on March 23, 2023

nitter.net makes it slightly less painful. There are browser addons to auto redirect to nitter.net e.g.

https://addons.mozilla.org/en-GB/firefox/addon/nitter-redire...

rvnx · on March 23, 2023

Lucky you, at least you can see the content.

I could see only the first message, the second message says This Tweet is unavailable, the third is ok, and then a bunch of random/unrelated messages.

layer8 · on March 24, 2023

See https://news.ycombinator.com/item?id=35282579

zamnos · on March 23, 2023

https://threadreaderapp.com/thread/1638769434763608064.html

tysam_and · on March 24, 2023

This is generally clickbait with a large amount of vapid information. I am surprised to be honest that HackerNews is giving the attention to it that they are. I would not encourage giving this any more attention.

As an early draft, them putting in a placeholder of ~'this model uses a lot of compute {TODO: put in cost estimates here?}' does not at all equal 'the authors didn't even know how much it cost to train the model!' Additionally, of course the toxicity went down. There's a world of RLHF between here and that original draft, and they've shown how the non-RLHF model has lowered the toxicity of the untrained base model significantly. If the author of the tweets had done their due diligence, they might have noticed that.

Rather obviously around the time when the model was originally being developed, text-only was sorta really the only way that LLMs were done. Them pivoting to multi-modal is just a natural part of following what works and what doesn't. This is really straightforward, I am mind-boggled that this is getting attention over discourse that is meaningful to the tidal wave of change coming with these models.

One final note that this is a bit of shoveltext is at the bottom the author offers up vague concerns followed by mass-tagging accounts with high follow counts, to include Elon Musk.

I'd encourage you not even to give the tweet the benefit of your view count and to just move on to more valuable discussions that are taking place. Why not take a look at a fun little thread like https://news.ycombinator.com/item?id=35283721 ? (not affiliated other the fact that I made the first comment on it, I just pulled from the rising threads on HN's frontpage)

goodgoblin · on March 23, 2023

The spontaneous toxic content stuff is a little alarming, but probably in the future there will be gpt-Ns that have their core training data filtered so all the insane reddit comments aren't part of their makeup.

GaggiX · on March 23, 2023

If you filter the dataset to remove anything that might be considered toxic, the model will have much more difficulty understanding humanity as a whole; the solution is alignment, not censorship.

ben_w · on March 23, 2023

While I share your belief, I am unaware of any proof that such censorship would actually fail as an alignment method.

Nor even how much impact it would have on capabilities.

Of course, to actually function this would also need to e.g. filter out soap operas, murder mysteries, and action films, lest it overestimate the frequency and underestimate the impact of homicide.

slowmovintarget · on March 23, 2023

Me: "grblf is bad, don't write about it or things related to it."

You: "What is grblf?"

As parents, my wife and I go through this on a daily basis. We have to explain what the behavior is, and why it is unacceptable or harmful.

The reason LLM models have such trouble with this is because LLMs have no theory of mind. They cannot project that text they generate will be read, conceptualized, and understood by a living being in a way that will harm them, or cause them to harm others.

Either way, censorship is definitely not the answer.

stevenhuang · on March 23, 2023

Welll....

Theory of Mind May Have Spontaneously Emerged in Large Language Models - https://arxiv.org/abs/2302.02083

Previously discussed - https://news.ycombinator.com/item?id=34730365

slowmovintarget · on March 24, 2023

Thank you for sharing... That's a really interesting paper.

ben_w · on March 24, 2023

That demonstrates possibly rather than necessity of alignment via having a definition.

Behaviours can be reinforced or dissuaded in non-verbal subjects, such as wild animals.

There's also the size of the possible behaviour space to consider: a discussion seldom has exactly two possible outcomes, the good one and the bad one, because even if you want yes-or-no answers it's still valid to respond "I don't know".

For an example of the former, I'm not sure how good the language model in DALL•E 2 is, but asking it for "Umfana nentombazane badlala ngebhola epaki elihle elinelanga elinesihlahla, umthwebuli wezithombe, uchwepheshe, 4k" didn't produce anything close to the English that I asked Google Translate to turn into Zulu: https://github.com/BenWheatley/Studies-of-AI/blob/main/DALL•...

(And for the latter, that might be why it did what it did with the Somali).

astrange · on March 24, 2023

Chatbot-tuned models must have a "theory of mind", because they're able to tell which parts of the chat history are theirs and which are yours.

(This doesn't use tokens. You can have a conversation in OpenAI Playground with text-davinci-003 and provide all the text yourself.)

cratermoon · on March 24, 2023

"The Colossal Clean Crawled Corpus, used to train a trillion parameter LM in [43], is cleaned, inter alia, by discarding any page containing one of a list of about 400 “Dirty, Naughty, Obscene or Otherwise Bad Words”. This list is overwhelmingly words related to sex, with a handful of racial slurs and words related to white supremacy (e.g. swastika, white power) included. While possibly effective at removing documents containing pornography (and the associated problematic stereotypes encoded in the language of such sites) and certain kinds of hate speech, this approach will also undoubtedly attenuate, by suppressing such words as twink, the influence of online spaces built by and for LGBTQ people. If we filter out the discourse of marginalized populations, we fail to provide training data that reclaims slurs and otherwise describes marginalized identities in a positive light"

from "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " https://dl.acm.org/doi/10.1145/3442188.3445922

That list of words is https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...

rhdunn · on March 24, 2023

That will also remove:

1. medical pages/docs using the medical terms anus, rectum, nipple, and semen (note that other medical terms are not on that list).

2. pages/docs using "sex" to refer to males and females.

3. pages/docs talking about rapeseed oil or the plant it comes from (https://en.wikipedia.org/wiki/Rapeseed_oil).

The big problem with these lists is that they exclude valid contexts, and only include a small set of possible terms, so the model would get a distorted view of the world (like it learning that people can have penises, vaginas, breasts, but not nipples or anuses, and breasts cannot be big [1]). It would be better to train the models on these, teach it the contexts, and teach it where various usages are archaic, out dated, old fashioned, etc.

[1] but this is excluding the cases where "as big as", etc. are used to join the noun from the adjective, so just excluding the term "big breasts" is ineffective.

cratermoon · on March 24, 2023

This is what's known as the Scunthorpe problem. https://en.wikipedia.org/wiki/Scunthorpe_problem

ben_w · on March 24, 2023

I was thinking of that, but I think that while it's in the same vein, there's also an additional problem.

Apart from that list missing non-English words, leet, and emoji, there are also plenty of words which can be innocent or dirty depending entirely on context: That list doesn't have "prick", presumably because someone read about why you're allowed to "prick your finger" but not vice versa.

Regarding Scunthorpe, looking at that word list:

> taste my

It's probably going to block cooking blogs and recipe collections.

GaggiX · on March 23, 2023

If "toxic content" is filtered out, it will be out of the model's distribution if it encounters it during inference, this is clearly not our goal and interest as AI designers, so it would not work as an alignment method; our interest is that the model can recognize toxic content but not produce it, OpenAI to address this issue is using RLHF, changing the model's objective from predicting the next token based on the distribution of the training dataset to maximizing the sparse reward of a human annotator.

MikeTheGreat · on March 23, 2023

Genuine question: What do you mean by 'alignment'? Is this a technique for training AIs, or a philosophy about how to approach this, etc?

I've never heard the term before and would love any pointers (including enough keywords to Google for it :) )

GaggiX · on March 23, 2023

https://en.m.wikipedia.org/wiki/AI_alignment

"In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests."

I also suggest the YouTube channel: "Robert Miles"

precompute · on March 24, 2023

>the solution is alignment, not censorship

haha, that's very naive. There's already heaps (veritable mountains, even) of information that isn't given to the public on the public-facing instances of ChatGPT, because some info is deemed too incendiary. Filtering out "unwanted" sources of information is already a goal of information labelling, on which these entire LLMs exist. If you were to really make a LLM out of what people really thought and put on the internet instead of the current practice of castration, you wouldn't have techbros wondering about jobs, you'd have a veritable revolution on your hands.

GenerocUsername · on March 23, 2023

OpenAI has incentive to 'accidentally' allow toxic content through, so when they make the case that all models should be censored and make it safe, they can pull up the ladder behind them.