> This is the equivalent of that Nikola demo of their electric car, where they set it up to roll down a hill so that it looked like it was working.
Idk how the Gemini demo (a thing that actually does work and do the things displayed, just not in the exact way shown) is "equivalent" to a literally non functioning car...
> The inputs to the AI are still frames, not video.
A video is just a sequence of frames. The input is always going to be frames when it actually goes into the model, and you don't need every single frame to understand what's happening in the video.
> The prompts were provided as text, not audio as shown.
That's trivial to do now. Using Whisper, you can just turn voice into text and do the exact same thing. They don't really need to demonstrate that.
So sure, they definitely embellished, made it seem real-time and as if it didn't need more target-specific prompting per task. But saying it is completely fake is foolish.
Precisely! I don’t see how people don’t see this as outright fraud.
The demonstration was a machine intelligence picking out meaning from a video.
The reality was a HUMAN using their meat brain to pick out the meaningful still frames and feeding that in to an AI that couldn’t have completed the demonstrated task on its own!
This is like making a demo of a robot cleaning a house, without acknowledging the janitorial staff doing the actual cleaning off-camera.
It’s absurdly fraudulent and should never have been made public.
Videos like this for such an existential product ought to have been reviewed by the CEO. After all, Google’s future relevance as a corporation depends on it.
It was requested, made, reviewed, approved, and then published.
A faked video of science fiction wishful thinking for major product launch.
Idk how the Gemini demo (a thing that actually does work and do the things displayed, just not in the exact way shown) is "equivalent" to a literally non functioning car...
> The inputs to the AI are still frames, not video.
A video is just a sequence of frames. The input is always going to be frames when it actually goes into the model, and you don't need every single frame to understand what's happening in the video.
> The prompts were provided as text, not audio as shown.
That's trivial to do now. Using Whisper, you can just turn voice into text and do the exact same thing. They don't really need to demonstrate that.
So sure, they definitely embellished, made it seem real-time and as if it didn't need more target-specific prompting per task. But saying it is completely fake is foolish.