Many might miss the key paragraph at the end: "Sora serves as a foundation for m...

lucisferre · 2024-02-15T20:18:35 1708028315

> since it is trained to simulate the real world

Is it though? Or is this just marketing?

janalsncm · 2024-02-15T20:49:10 1708030150

I was impressed with their video of a drone race on Mars during a sunset. In part of the video, the sun is in view, but then the camera turns so it’s out of view. When the camera turns back, the sun is where it’s supposed to be.

djsavvy · 2024-02-15T21:11:08 1708031468

there's mention of memory in the post — the model can remember where it put objects for a short while, so if it pans away and pans back it should keep that object "permanence".

hbn · 2024-02-15T21:24:35 1708032275

Well the video in the weaknesses section with the archeologists makes me think it's not just predicting pixels. The fact that a second chair spawns out of nothing looks like a typical AI uncanny valley mistake you'd expect, but then it starts hovering which looks more like a video game physics glitch than an incorrect interpretation of pixels on screen.

rdedev · 2024-02-15T20:38:40 1708029520

If it is its not there yet. The snow in the mammoth video kind of looks like smoke, the way it rises into the air

wilg · 2024-02-15T20:39:17 1708029557

I think it's just inherent to the problem space. Obviously it understands something about the world to be able to generate convincing depictions of it.

lucisferre · 2024-02-15T21:00:22 1708030822

It seems very dangerous to assume claims without evidence are obvious.

wilg · 2024-02-15T21:02:44 1708030964

I didn't do that.

nopinsight · 2024-02-15T20:40:55 1708029655

What other likely reasons might explain the leap ahead of other significant efforts?

See also: https://news.ycombinator.com/item?id=39387333

lucisferre · 2024-02-15T20:49:43 1708030183

Just having a better or bigger model? Better training data, better feedback process, etc.

Seems more likely then "it can simulate reality".

Also I take anecdotal reviews like that with a grain of salt. I follow numerous AI groups on Reddit and elsewhere and many users seem to have strong opinions that their tool of choice is the best. These reviews are highly biased.

Not to say I'm not impressed, but it's just been released.

nopinsight · 2024-02-15T21:08:07 1708031287

Object persistence and consistency are not likely to arise simply from a bigger model. A different approach or architecture is needed.

Also, I just added a link to an expert’s tweet above. What do you think?

lucisferre · 2024-02-15T21:48:37 1708033717

Others have provided explanations for things like object persistence, for example keeping a memory of the rendering outside of the frame.

The comment from the expert is definitely interesting and compelling, but clearly still speculation based on the following comment.

> I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!

I like the speculation though, the comments provide some convincing explanations for how this might work. For example, the idea that it is trained using synthetic 3-dimensional data from something like UE5 seems like a brilliant idea. I love it.

Also in his example video the physics look very wrong to me. The movement of the coffee waves are realistic-ish at best. The boat motion also looks wrong and doesn't match up with the liquid much of the time.

grbsh · 2024-02-16T01:08:13 1708045693

I think you are reading too far into this. The title of the technical paper is “ Video generation models as world simulators”.

This is “just” a transformer that takes in a sequence of noisy image (video frame) tokens + prompt, and produces a sequence of less noisy video tokens. Repeat until noise gone.

The point they’re making, which is totally valid, is that in order for such a model to produce videos with realistic physics, the underlying model is forced to learn a model of physics (a “world simulation”).

nopinsight · 2024-02-16T01:33:36 1708047216

AlphaGo and AlphaZero were able to achieve superhuman performance due to the availability of perfect simulators for the game of Go. There is no such simulator for the real world we live in. (Although pure LLMs sorta learn a rough, abstract representation of the world as perceived by humans.) Sora is an attempt to build such a simulator using deep learning.

This actually affirms my comment above.

  “Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.”

https://openai.com/research/video-generation-models-as-world...

What part of my argument do you disagree about?

lanternfish · 2024-02-16T10:16:17 1708078577

`since it is trained to simulate the real world, as opposed to imitate the pixels.`

It's not that its learning a model of the world instead of imitating pixels - the world model is just a necessary emergent phenomenon from the pixel imitation. It's still really impressive and very useful, but it's still 'pixel imitation'

xtracto · 2024-02-15T21:20:50 1708032050

What I want is an AI trained to simulate the human body, allowing scientists to perform artificial human trials on all kind of medicines. Cutting trial times from years to months.

delichon · 2024-02-15T23:10:58 1708038658

Or to simulate the short or long term regret you'll feel for eating the meal in the photo.

mentalpiracy · 2024-02-15T20:47:39 1708030059

> "understand... the real world"

doing a lot of heavy lifting in this statement

fasteddie31003 · 2024-02-15T20:24:51 1708028691

Movie making is going to become fine-tuning these foundational video models. For example, if you want Brad Pitt in your movie you'll need to use his data to fine-tune his character.

kevmo314 · 2024-02-15T20:20:16 1708028416

What is latent space if not a representation of the real world?

nopinsight · 2024-02-15T20:46:12 1708029972

Pretty sure many latent spaces are not trained to represent 3D motions and some detailed physics of the real world. Those in pure text LLMs, for example.