Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

This is just lossy compression with a large and well-tuned (to the expected problem domain) dictionary.

Video compression codecs can achieve a 500x compression ratio, and they are general-purpose.



The dataset, LAION-5B, is 240TB of already compressed data. (5 billion pairs of text to 512x512 image.)

Uncompressed, LAION-5B would be 4PB, for a compression ratio into SD of ~780kx, or one byte per picture.


The point is that there's is no practical limit on compression. You don't need "AI" or anything besides very basic statistics to get astronomical compression ratios. (See: "zip bomb".)

The only practical limit is the amount of information entropy in the source material, and if you're going to claim that internet pictures are particularly information-dense I'd need some evidence, because I don't believe you.


Correct, however "compression is equivalent to general intelligence" (http://prize.hutter1.net/hfaq.htm#compai ) and so in a sense, all learning is compression. In this case, SD applies a level of compression that is so high that the only way it can sustain information from its inputs is by capturing their underlying structure. This is a fundamentally deeper level of understanding than image codecs, which merely capture short-range visual features.


I fail to see the difference between "underlying structure" and "short-range visual features".

Both are just simple statistical relationships between parameters and random variables.


Sure, but why would that not apply to humans? And we don't consider it copyright violation if a human learns painting by looking at art.


Depends on what you mean by "humans".

Most human behavior is easy to describe with only a few underlying parameters, but there are outlier behaviors where the number of parameters grows unboundedly.

("AI" hasn't even come close to modeling these outliers.)

Internet pictures squarely falls into the "few underlying parameters" bucket.


Because we made the algorithms and can confirm these theories apply to them.

We can speculate they apply to certain models of slices of human behaviour based on our vague understanding of how we work, but not nearly to the same degree.


Hang on, but- plagiarism is a copyright violation, and that passes through the human brain.

When a human looks at a picture and then creates a duplicate, even from memory, we consider that a copyright violation. But when a human looks at a picture and then paints something in the style of that picture, we don't consider that a copyright violation. However we don't know how the brain does it in either case.

How is this different to Stable Diffusion imitating artists?


human memory is lossy compression




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: