Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This indeed looks more like photogrammetry than a diffusion model predicting the next frame. There's 3D information extracted from the input image and likely additional generated poses that allow reconstructing the scene with gaussian splats. Not sure how much segmentation (understanding of each part of the scene) is going on. Probably not much if I have to guess.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: