Probably it's just a little unity world with a visual filter, some models and movements that those models make, and then you train GPT-3 by giving it the API of what scenes, actions, and camera angles are available, and have it generate those in combination with a script and laugh tracks.