It's even possible they converge when trained on different data, if they are lea...

IshKebab · 2024-03-27T16:23:38 1711556618

That sounds unsurprising? Like if you take any set of numbers, randomly split it in two, then calculate the average of each half... it's not surprising that they'll be almost the same.

If you took two different training sets then it would be more surprising.

Or am I misunderstanding what you mean?

MajimasEyepatch · 2024-03-27T19:43:10 1711568590

It doesn't really matter whether you do this experiment with two training sets created independently or one training set split in half. As long as both are representative of the underlying population, you would get roughly the same results. In the case of human faces, as long as the faces are drawn from roughly similar population distributions (age, race, sex), you'll get similar results. There's only so much variation in human faces.

If the populations are different, then you'll just get two models that have representations of the two different populations. For example, if you trained a model on a sample of all old people and separately on a sample of all young people, obviously those would not be expected to converge, because they're not drawing from the same population.

But that experiment of splitting one training set in half does tell you something: the model is building some sort of representation of the underlying distribution, not just overfitting and spitting out chunks of copy-pasted faces stitched together.

evrial · 2024-03-29T00:01:07 1711670467

That's explanation of central limit theorem in statistics. And any language is mostly statistics and models are good at statistical guessing of the next word or token.

taneq · 2024-03-27T22:20:38 1711578038

If not are sampled from the same population then they’re not really independent, even if they’re totally disjoint.

evrial · 2024-03-29T00:03:13 1711670593

They are sourced mostly from the same population and crawled from everything can be crawled.

Tubbe · 2024-03-27T16:09:33 1711555773

Got a link for that? Sounds super interesting

d_burfoot · 2024-03-27T17:12:49 1711559569

https://en.wikipedia.org/wiki/Theory_of_forms

bobbylarrybobby · 2024-03-27T20:10:56 1711570256

I mean, faces are faces, right? If the training data set is large and representative I don't see why any two (representative) halves of the data would lead to significantly different models.

arcticfox · 2024-03-27T21:18:55 1711574335

I think that's the point; language is language.

If there's some fundamental limit of what type of intelligence the current breed of LLMs can extract from language, at some point it doesn't matter how good or expansive the content of the training set is. Maybe we are finally starting to hit an architectural limit at this point.

dumbfounder · 2024-03-27T22:52:03 1711579923

But information is not information. They may be able to talk in the same style, but not about the same things.