Thanks, Benjie! Great to see you here. I hope it's OK if I plug your excellent writings on robotics that I think everyone should check out: https://generalrobots.substack.com/
Hey HackerNews. Using AIs for coding it has become obvious that software development will use AI assistance from now on. I wrote this post talking about what it was like for me to teach an LLM a new technology.
I'm super curious about other folks' experience, as I'm sure lots of people are doing similar things.
I'm on an EC2 instance. Most of the effort was on cuda/pytorch/pip nonsense. Once stable diffusion webui was working, diffusers worked basically out of the box, which was really nice. (Trickiest bit was figuring out that I needed to use their tool to convert my safetensor file and that the version of python I was using wasn't working with it for some reason). Stack is flask + gunicorn which was what chatgpt recommended (lol). I had a websockets version of the progress bar working on flask-socketio on my local machine but could never get the server version through nginx to work correctly. So eventually I just gave up and switched it to polling so I could launch.
While I have folks attention, I want to try training a model to generate monster/creature walk animations. Anyone know of a dataset of walk cycle sprite sheets that I could massage and label to see if I can make that work?
I have an idea for you to try - instead of training a model to produce subsequent animation frames (which is tough), instead, take a model trained on pixel art sprites in general, and then use a ControlNet with the input to the ControlNet being either a pose model or a higher res 3d model of a generic dummy character made in blender - and then generate output frame by frame, keeping the input prompting the same, but moving the ControlNet input frame by frame.
To get it down to small pixeled 'sprite' scale, the right thing may be to actually output 'realistic' character animation frames this way, and then 'de-res' them via img2img into pixel art. The whole pipeline could be automated so that your only inputs are a single set of varied walking/posing/jumping control net poses and the prompts describing the characters.
There are a lot of sprites to work with. As I'm sure you're aware, there are artists known for making animations, like Pedro Medeiros; spriters-resource.com has material from thousands of games; you can buy the Unity Asset Store, itch.io and stock art pixel art assets; and you can use DevX Tools Pro to extract assets from hundreds of 2D pixel art Unity games. All told, there are maybe 100,000-1m examples of high quality pixel art you can scrape. It is additionally possible that it already exists in the major crawls and needs to be labeled better.
A few people have tried training on sprite sheets and emitting them directly, and it did not work.
A few people have been working specifically on walking cycles, and it has a lot of limitations.
In my specific experience with other bespoke pixel art models, if you ask for a "knight," you're going to get a lot of the same looking knight. Fine-tuning will unlearn other concepts that are not represented in your dataset. LORAs have not been observed to work well for pixel art. You can try the Astropixel model, the highest quality in my opinion, for prototyping.
Part of this is you're really observing how powerful ControlNet, T2I-Adapters and LORAs are and you may have the expectation that something else you, a layperson, can do will be similarly powerful. Your thing is really cool. But is there some easy trick without doing all this science, for animation? No. Those are really big scientific breakthroughs, and with all the attention on video - maybe 100-1,000 academic and industry teams working on it - there still hasn't been something super robust for animation that uses LDMs. The most coherent video is happening in with NeRF, and a layperson isn't going to make that coherent with pixel art. Your best bet is to wait. That said, I'm sure people are going to link here to some great hand-processed LDM videos, and maybe there's a pipeline with hand artwork a layperson can do today that would work well.
That seems counter to the current AI mentality though. Clearly if it's online and available it's part of your AI playground so go nuts.
Edit: In case my sarcasm isn't clear, I hate this mentality and I am just bitterly griping about AI into the void. You should definitely ask permission to use data before training AI on it, but that will put you behind other AI people who aren't asking permission
I'm also curious if anyone has made a level that worked particularly well/poorly or has a great custom theme (that maybe I should add to the dropdown) :)
Sure, but I'm more interested in things that were just impossible before. You can hire a artist to illustrate a level, or have AI do it cheaper, but you can't have an artist illustrate a level the player made while they wait. I think there are whole play patterns that are possible because the cost and especially speed of creating the art are many orders of magnitude different.
I think even more interesting is generating entire styles and story lines that evolve infinitely and coherently based off of seeds. Players could even inject a concept - “steampunk” or “discworld” and an LLM could construct the story, with characters, and visual themes.
Several of the themes (including "alien jungle", which is my favorite) we're created by chatGpt. I totally want to try evolving the game in that direction.
I think this could be great for letting the players design custom weapons/Armor/spells very roughly and then using AI to convert it to something that looks good in the game
The main one is that making the control-net depth input look like something helps a ton. You can creating levels that have more 'structure' (large flat platforms, platforms that line up with others) and levels that are more random and see that the structure works way better. I played around a lot with turning the control-net on and off at the beginning and end of generation, which seemed to help when I was playing in the webui but then I didn't immediately find the API in diffusers and the results I was getting were great so I didn't keep looking.
Depth is a useful parameter for controlnet, especially when you want really specific forms. I've found that it can hamper outputs because blank sections of solid color are interpreted as flat walls, when really I'm trying to make those parts ambiguous!
Yeah, in the map editor there is, in fact a random button that generates. I havn't gotten around to making sure that the random level is playable (and about 1 in 4 have unreachable areas) but that wouldn't be that hard to add. (I've been focused on the creative aspect of creating your own levels because right now that part is more fun).