Oh my gosh, another one-file PyTorch implementation. This is fantastic. I'd like to hope that some of my previous work (hlb-CIFAR10 and related projects, along with other influences before it like minGPT, DawnBench, etc.) has been able to help push the 'simple, single-file, reduced-complexity' format forward a bit. I personally think that this kind of work is critical to efficient ML research, and that is possibly one of the most important things that we can do for the field today.
Research progresses at the speed of innovation, which progresses with the inverse of experiment runtime, which is definitely and absolutely related to the underlying Kolmogorov Complexity of the code w.r.t. a research/simple-hackery-focused objective.
I really cannot stress enough how important to research tools like this are and how much they've sped up the knowledge discovery process for me personally. Being able to quickly sketch out ideas, often in minutes, and get immediate, high-snr results back has become an indispensable part of my research progress. While we seem to really good at some of the specifics of some of the detailsresearch, and somehow have extremely information-efficient training processes, we have not applied the same logic seemingly on the whole to the entire research field!
Knowledge distillation and/or the MDL (https://en.wikipedia.org/wiki/Minimum_description_length) are excessively important I think to reversing a lot of the constant fluff, cruft, and overly dense thrash-and-hope-you-don't-get-scooped-by-other-researchers-on-marginal-value-topics trend that I think has largely been encouraged by the current paper submission/review/etc process.
I've been wanting to try to get around this and move a bit more towards a slightly better scaling solution recently. One of these things is that I've started distributing my code in 1-file, self-contained, short rough gists as 'code sketches', which shortens dev time and gets rough, unpolished, working code for a concept in people's hands. It seems to work pretty well so far, I hope to continue doing it! <3 :'))))
In any case, this is extremely exciting stuff, and everyone -- please! More code like this! We're researchers on learning data in a largely-scaled way, let's be data-efficient in how we disseminate information as well! It's a dream come true to see a lot more of this stuff coming down the pipeline, fantastic work and keep it coming! <3 :')))) Woop woop woop!!!!
It’s been an exciting 2023 year in no small part because of watching AI research unfold at these crazy speeds. Like you’ve said, these enablers like ArXiV, PyTorch, GitHub, Huggingface, and terse Python code that’s open source are dramatically accelerating the development of this new field.
It’s probably the fastest the human race has ever developed anything of substantial complexity!
The only other place I see this king of velocity is SpaceX, which also launched two cutting edge rockets this year.
Minor potential performance benefit -- it looks like you might be able to fuse the x_proj and dt_proj weights here as x_proj has no bias. This is a thing that's possibly doable simply at runtime if there's any weight-fiddling reqs, I'm guessing the single kernel + bias will still run faster in the end (not sure though! <3 :')))) )
Research progresses at the speed of innovation, which progresses with the inverse of experiment runtime, which is definitely and absolutely related to the underlying Kolmogorov Complexity of the code w.r.t. a research/simple-hackery-focused objective.
I really cannot stress enough how important to research tools like this are and how much they've sped up the knowledge discovery process for me personally. Being able to quickly sketch out ideas, often in minutes, and get immediate, high-snr results back has become an indispensable part of my research progress. While we seem to really good at some of the specifics of some of the detailsresearch, and somehow have extremely information-efficient training processes, we have not applied the same logic seemingly on the whole to the entire research field!
Knowledge distillation and/or the MDL (https://en.wikipedia.org/wiki/Minimum_description_length) are excessively important I think to reversing a lot of the constant fluff, cruft, and overly dense thrash-and-hope-you-don't-get-scooped-by-other-researchers-on-marginal-value-topics trend that I think has largely been encouraged by the current paper submission/review/etc process.
I've been wanting to try to get around this and move a bit more towards a slightly better scaling solution recently. One of these things is that I've started distributing my code in 1-file, self-contained, short rough gists as 'code sketches', which shortens dev time and gets rough, unpolished, working code for a concept in people's hands. It seems to work pretty well so far, I hope to continue doing it! <3 :'))))
In any case, this is extremely exciting stuff, and everyone -- please! More code like this! We're researchers on learning data in a largely-scaled way, let's be data-efficient in how we disseminate information as well! It's a dream come true to see a lot more of this stuff coming down the pipeline, fantastic work and keep it coming! <3 :')))) Woop woop woop!!!!
Excellent stuff. <3 :'))))