Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
4chan and other web sewers scraped up into Google's mega-library for training ML (theregister.com)
24 points by rolph on April 28, 2023 | hide | past | favorite | 32 comments


Who determines what is a sewer and what is the voice of the unheard?


I challenge you to read /b/ for half an hour and maintain this devil's advocate position. It is absolutely a sewer.


Whoever has the gold -or who stands to lose same.


You're not seriously claiming that 4chan is full of unheard voices?


I am seriously claiming that. Everywhere else on the internet, you'll get banned for posting something even a little politically incorrect.


> Everywhere else on the internet, you'll get banned for posting something even a little politically incorrect.

No you won't. 4chan's strain of political incorrectness is endemic throughout the internet. It's suffused internet culture and become normalized in the mainstream. It's found in every comment section on every social media platform, and the terminally edgy have an entire ecosystem of alternative forums, platforms and alt-accounts when the normies are just too square for their tastes. It's even becoming commonplace on HN. There is literally no demographic more heard than edgy 4chan shitlords.


I think that the most remarkable thing about the website is exactly its cultural force, even while being near universally maligned by traditional media and the "polite" sides of the internet. No where else can you "get away with" saying something that you couldn't elsewhere, and suffer absolutely 0 consequences for voicing actually controversial opinions. Everything there is anonymous, unverifiable, and yet oftentimes it rings more true than what you might here from a "reputable" source.


> No where else can you "get away with" saying something that you couldn't elsewhere, and suffer absolutely 0 consequences for voicing actually controversial opinions.

I'm not sure you fully understand the issue. 4chan is nothing special, let alone remarkable, with regards to "getting away with" saying something. It's just a popular spot and the only one you know. I assure you that there are a myriad of forums and chatrooms all around the web that have the same sort of contents and even worse, being exchanged. The key difference is that 4chan is the mainstream, generalist forum, and a huge one at that with millions of unique monthly visitors.

Claiming that 4chan is unique in it's content reads like claiming that Pepsi is a small unknown soda brand that is almost secret.


There are many, many small forums where people can air the most controversial opinions--I don't disagree. But the fact that there is one MASSIVE one in the public consciousness means that people aren't automatically filtered into these mass, heavily monitored and controlled forums like reddit, facebook, instagram, etc. but instead can easily find a place where nobodies opinion, voice, or resistance to what is considered polite and "acceptable," and speak there.

In the old days what you're saying is true, but we're not in the old days, we're in the days where everyone uses the internet and the internet is vastly profitable. Where corporations, the state, the news-media, relentlessly seek to manipulate public opinion to their own aims. In all that, something as remarkable as the continued existence of 4chan stands directly opposed to power and everything in society that seeks to regulate and control behavior and discourse.


> 4chan's strain of political incorrectness is endemic throughout the internet.

On 4chan it used to be pretty common to just refer to everyone with slurs. I'm assuming it hasn't changed much but maybe it's tamer.

I really don't agree with you that 4chan culture somehow permeates all the internet, especially not some kind of political incorrectness.

Many memes and jokes come from 4chan yes, but only the most tame ones that are ok to share. The really politically incorrect stuff generally stays there.


You're welcome to reflect on what would happen if you posted 4chan content here and why.


You might not think so, but there are absolutely unheard voices there.

The /lgbt/ board is full of viewpoints I've not seen shared elsewhere.

Is that a good thing? Not for me to decide. I'm not particularly involved with 4chan culture so I don't understand what they're saying. But they've definitely found a space where they can share their opinions.


Most 4channers would themselves call it a sewer. Any person who thinks the content is anything other than word-shaped human waste is suffering from severe delusions.


I think it's pretty common that ingroups can say whatever they'd like. Outgroups saying the same thing, however, takes on any entirely different color.

Anyways, 4chan was seminal to the internet as you know it. The Portal developers debuted Narbacular Drop, it was foundational to early streaming e.g. "let's plays" which are echoed by several major platforms to this day (Justin.tv) it isn't verifiable by any means but I recall someone either posing as the YouTube developers, or the actual developers themselves promoting the earliest iterations of YouTube - early enough that I was wholly skeptical of the viability of the platform. I'm reasonably sure that Stranger Things was written by a creepyposter, or ripped off by the writing crew. A lot of the meme syndromes were pioneered there or at least allowed to proliferate. It's frankly ridiculous and highly biased to color it in broad strokes.


4chan circa 2004-2013 is very much not 4chan in 2023.

The site does not influence as much as it is influenced at this stage, you see memes cooked up on telegram and twitter which then filter back to 4chan, this backflow would've been unthinkable before — 4chan was faster, it was first. It is no longer that.


I don't browse 4chan anymore, not regularly, but they're still streamlined in some categories. They're still very gung-ho and high seas from what I've seen. I don't think 4chan has the cultural edge it once did, but I still think the community tends to be on the cutting edge just due to the anonymity, ephemerality, and the fact the site has always sat hard on being edgy. But maybe I'm wrong, I'm hardly a cultural analyst.

I guess I'd point to Automatic1111 (4channer) as my most recent high-profile exposure to something 4chan related. And Automatic1111's repo was the favored client, and one of the most rapidly developed programs I've ever seen.


Ah, the ol' "I know a sewer when I see it" rule.


Shitposting is the voice of the unheard.


Anyone can call anything anything. Sewers are toxic and 4chan is toxic. Easy deduction.

However 4chan is more than just a sewer. It's a swamp fed by an sewer outlet which is also 4chan.

Depending how far you want to climb up the pipe, the more toxic it gets. The underbelly and secret boards of 4chan are really where the sewage gushes from.

All in all 4chan is a forum with controversial and non-controversial discussion.


By that definition so are Reddit and Twitter.


Absolutely. Just more diluted with more cleanup.


This is a safety feature. It can never be too intelligent this way. Yudkowsky can chill.


This is good data for a large language model. If I am working with GPT-6 I want it to be smart enough to know how a 4chan post ends given how it begins, no matter how sewagey, and even if the opportunity to do it never arises in our work. This capability contributes to its general intelligence which is helpful even if you are working with GPT-6 to make only wholesome things.


Now all hope will be lost if Twitter, Facebook and Reddit is also included...


I wonder when they trained on text from 4chan they made it to predict next words like normally would, or they made it to predict next words like it is a 4chan text?


i wish the original title could be abit more neutral; i would like to bring up, the idea that to some extent, 4chan, and similar postings did not decay to 404, but were preserved, in a "scrape".

just what pecadillos [and worse] are in there, who knows, and how much effort is required to de-anon, who knows.


If you want neutral titles then you'll have to get your news somewhere other than El Reg


In the grand scheme of things, places like 4chan have to exist because humanity needs it.

If you want the real sewer look at the WaPo and NYT comment sections. There's more hate on their than on 4chan...though it's directed at right wingers, so it's socially acceptable.


4chan's fastest board is currently covered in swastikas, the n-word, cartoon pictures of trans people hanging themselves and "Chinese gore" threads, I'm going to have to disagree with the WaPo/NYT comment sections containing "more hate"


https://i.redd.it/n0w4pc7rea7a1.jpg

One thing I've heard about 4chan is that they're the most diverse group of white supremacists.


Exactly this. If the genteel, sheltered commenters of HN find 4chan too distasteful, they should get out of their bubble and see what people say in real life, for perspective.


[Citation needed]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: