I have an idea for you to try - instead of training a model to produce subsequen...

I have an idea for you to try - instead of training a model to produce subsequent animation frames (which is tough), instead, take a model trained on pixel art sprites in general, and then use a ControlNet with the input to the ControlNet being either a pose model or a higher res 3d model of a generic dummy character made in blender - and then generate output frame by frame, keeping the input prompting the same, but moving the ControlNet input frame by frame.

Something like how this posing works: https://www.youtube.com/watch?v=CiG_v61cLxI

To get it down to small pixeled 'sprite' scale, the right thing may be to actually output 'realistic' character animation frames this way, and then 'de-res' them via img2img into pixel art. The whole pipeline could be automated so that your only inputs are a single set of varied walking/posing/jumping control net poses and the prompts describing the characters.