This is really, really cool. A few months ago I was playing with some of the "video" generation models on Replicate, and I got some really neat results[1], but it was very clear that the resulting videos were made from prompting each "frame" with the previous one. This looks like it can actually figure out how to make something that has a higher level context to it.
It's crazy to see this level of progress in just a bit over half a year.
It's crazy to see this level of progress in just a bit over half a year.
[1]: https://epiccoleman.com/posts/2023-03-05-deforum-stable-diff...