Here is the full video also linked at the bottom. It also shows the one that trained longer that the wolves start successfully hunting the sheep after more training examples.
The ai seems to die at the top of the map unexpectedly for some reason. Like 6:07.
Another interesting observation is that the wolves don't coordinate it seems. That probably implies that the reward functions are individual, so they're technically competing rather than cooperating.
Lastly... they seem to not be very good at the game even at the end
Here is the full video also linked at the bottom. It also shows the one that trained longer that the wolves start successfully hunting the sheep after more training examples.