Diffusion Training from Scratch on a Micro-Budget

worstspotgain · 2024-07-30T06:30:53 1722321053

Asymptotic improvements are flattening the cost curves so fast that AI regulation might become practically meaningless by the end of the year. If you want unregulated output you'll have tons of offshore models to choose from.

The risk is that the good guys end up being the only ones hampered by it. Hopefully it won't be so large a burden that the bad guys and especially the so-so guys (those with a real chance, e.g. Alibaba) get a massive leg up.

uyzstvqs · 2024-07-30T07:53:35 1722326015

> Asymptotic improvements are flattening the cost curves so fast that AI regulation might become practically meaningless by the end of the year.

Awesome. This will mean actually good open-source models, not just API endpoints by big tech which are unusable because of dataset censorship and bias alignment (SD3, Gemini).

In other words, big tech will actually need to make good stuff to be competitive, not trash protected by a granted monopoly.

sigmoid10 · 2024-07-30T08:32:35 1722328355

Those improvements are definitely real, but we also have pretty solidly established and confirmed scaling laws by now. Until someone utterly breaks those, big players will always have an edge, simply because they can spend more compute on training and inference. The only way to change this is with a new architecture that benefits more from intelligent adjustments in a space than cannot be searched efficiently with raw compute. And even then we are not far from the point where these models could try out those adjustments themselves. So by the time you get to tune your own GAI in your home like you could do with a human, corporations might have millions of them improving themselves to something you could never achieve on your own.

worstspotgain · 2024-07-30T10:18:31 1722334711

We're still in phase 1, where human-directed improvement has the highest potential. Papers are still getting published and the cells interlinked. (I'm not sure the scaling picture is at all clear, given that papers like this can turn up casually with 15x savings, but let's put that aside for now.)

Phase 2 begins when patents break stealth, unsettling the picture. If some patent impairs research or operations in IP-solid countries, the lower-level stuff might move to local inference, and maybe some minor Pirate Bay-style outfits.

Phase 3 begins when the costly research goes dark (well, darker.) Everyone is Apple now. The research papers are replaced by white papers, then by PR communiqués.

Phase 4 begins when the AI AI researchers take over. The old AI researchers turn into their managers.

Some of the path is compute-bound. Some of it is IP-, luck-, and genius-bound.

whywhywhywhy · 2024-07-30T12:27:08 1722342428

You can't really stop it at this point anyway without completely locking down any code that resembles AI at a processor level to only signed and allowed models and making owning hardware before the lock illegal and destroying any thats ceased.

eru · 2024-08-06T05:42:45 1722922965

> [...] locking down any code that resembles AI at a processor level

Just outlaw matrix multiplication, I guess?

pk-protect-ai · 2024-07-30T19:36:12 1722368172

Please, don't give them ideas!!!!

Y_Y · 2024-07-30T15:11:41 1722352301

Who are the good guys and bad guys again?

benopal64 · 2024-07-30T18:52:43 1722365563

I almost commented the same thing. Framing things as "good/bad/so-so" is kind of moving the target. If we are focusing on who might use the model, rather than considering that when focusing on a model that accurately represents reality and altruistically aids humans... we will lose sight of the really valuable things in life. The reality is that I do not think that people are good/so-so/bad as humans are equipped with extremely complex and diverse adaptive systems with near-limitless capabilities. Sure, I am just re-framing, however from my perspective, we are not reducing humans to "good/bad/so-so."

What does HK think?

FeepingCreature · 2024-07-30T06:54:01 1722322441

Unregulated output at small scales. The really big training runs will still cost millions.

worstspotgain · 2024-07-30T07:01:20 1722322880

Not when we're talking asymptotically. The linked paper for instance claims 14- to 118-fold cost reductions. 1-2 GPU generations from now you'll train this model for $0.12.

impossiblefork · 2024-07-30T08:07:45 1722326865

Surely not $0.12.

Maybe $100.

moffkalast · 2024-07-30T10:19:51 1722334791

People are casually dropping thousands on cloud GPUs making random fine tunes over at r/localllama, the threshold will be met far sooner. Plus datacenters selling away their collection of A100s and eventually H100s when they become EoL for their standards.

impossiblefork · 2024-07-30T21:32:31 1722375151

Yes, but that doesn't mean that things are suddenly incredibly cheap. It just means that people have hardware.

$0.12 I feel is extremely superlative and I feel that $100 is more reasonable.

Flux159 · 2024-07-30T05:47:28 1722318448

This kind of research is great for reducing training costs as well as enabling more people to experiment with training large models. Hopefully in 5-10 years we'll be able to train a model on par with SD 1.5 with consumer gpus since that would be great for teaching model development.

thomashop · 2024-07-30T18:07:16 1722362836

I'm pretty sure we are looking at something like 12 months. Not 5-10 years.

Pixart and this paper are good data points. Another even just 50x reduction in cost will make it possible on consumer hardware easily. This paper already claims over 100x reduction

pk-protect-ai · 2024-07-30T19:46:49 1722368809

I hope that somewhere around that period of time we will have AI-based "game" engines working at 30-40fps at 4K (of course with upscaling). I mean it might not be game engines per se, but universal, interactive pipelines for audio-visual content creation/consumption. Because right now I do not see any hope of such engines due to the number of models involved and the latencies this implies.

kaiokendev · 2024-07-31T04:49:11 1722401351

i'm really hoping for this as well! i'm a big believer that neural rendering pipelines will overtake traditional push-tris-to-gpus that we've essentially been using since Descent

Blackthorn · 2024-07-30T06:48:17 1722322097

Getting parity with SD 1.5 should require a similarly comprehensive data set, which seems a lot harder to source than a computer GPU. Especially now that we've got the A I-equivalent of pre/post nuclear steel.

roenxi · 2024-07-30T08:42:25 1722328945

Given how little artistic data humans need, there are probably breakthroughs coming that will reduce the size of the data set needed. Or make it so that a lot of the data required is more generic (like how a human artist needs vast amounts of audio-visual data from walking around every day, but maybe as little as a few megabytes to go from nothing to copying a new style and subject - then we can have a curated open source "highlights of the first 20 years of life" data set that everyone uses for basic training).

whywhywhywhy · 2024-07-30T12:34:18 1722342858

> Getting parity with SD 1.5 should require a similarly comprehensive data set, which seems a lot harder to source

Wasn't SD1.5 trained on LAION? So we know what it was and you could recreate it.

Although I thought LAION was why SD1.5 is kinda ugly at base settings because LAION is just random images both good and bad content and quality not aesthetic and high quality images.

Blackthorn · 2024-07-31T02:17:45 1722392265

Low-quality, non-aesthetic (but still natural) images are still pretty important for training a high quality, diverse model.

Also unless someone saved the actual data somewhere and will make it available, LAION-5B is rotted away.

philipkglass · 2024-07-30T18:23:38 1722363818

The LAION datasets don't contain actual images, but URLs pointing to images. Due to link rot and deliberate scraper-blocking it may be difficult to download LAION images to retrain a model to match SD 1.5.

orbital-decay · 2024-07-30T05:24:43 1722317083

Reminds me of PixArt-α which was also trained on the similarly tiny budget ($28,000). [0] How good is their result, though? Training a toy model is one thing, making something usable (let alone competitive) is another.

Edit: they do have comparisons in the paper, and PixArt-α seems to be... more coherent?

[0] https://pixart-alpha.github.io/

daghamm · 2024-07-30T06:16:16 1722320176

They mention that as state of the art, this one is supposed to be 14-18x better.

By the way, has anyone ran these locally? Is the inference time also lower?

p1esk · 2024-07-30T05:44:32 1722318272

Interesting - they say using FP8 didn’t provide any speed up.

sorenjan · 2024-07-30T20:28:06 1722371286

One thing I've wondered about is fine tuning a large model from multiple LoRAs. If the model doesn't fit in your vram you can train a LoRA, apply it to the model, train another LoRA from the same data, apply it, and so on. Iterative low rank parameter updates. Would that work?