I think this type of model will have a massive impact on the software industry. 99% of programming tasks in the wild don't involve any kind of algorithmic design, but are more like making a CRUD pattern, writing SQL queries etc. This kind of work is easier to automate but more difficult to source the training data. If and when these models are applied to more mundane problems, I'd expect immediately better performance and utility.
We're also in the very very early days of code generation models. Even I can see some ways to improve AlphaCode:
- the generate->cluster->test process feels like a form of manual feature engineering. This meta layer should be learned as well, possibly with RL
- programming is inherently compositional. Ideally it should perform the generate->cluster->test step for each function and hierarchically build up the whole program, instead of in a single step as it does now
- source code is really meant for humans to read. The canonical form of software is more like the object code produced by the compiler. You can probably just produce this directly
It's interesting that AI is being aggressively applied to areas where AI practitioners are domain experts. Think programming, data analysis etc.
We programmers and data scientists might find ourselves among the first half of knowledge workers to be replaced and not among the last as we previously thought.
Compilers didn't replace any jobs, they created more. Similarly, this type of AI-assisted programming will allow more people to program and make existing programmers more productive.
I was thinking over a really long time period. There is at least 20-30 more years of general purpose programming being a highly sought after skill. But with time most programming is going to be done by AI that is directed by domain experts.
In my view this type of system will only be usable by Real Computer Scientists and will completely kill off the workaday hacker. Think of all the people who bitterly complain that a C++ compiler does something unexpected under the banner of UB. That crowd cannot cope with a world in which you have to exactly describe your requirements to an AI. It is also analogous to TDD, so all the TDD haters, which is the overwhelming majority of hackers, are toast.
You can write code that is valid as in "can be compiled" but outside of C++ standard. It is duty of programmer to not have those, as compiler usually assumes that there's no UB in your code and can do unintuitive things with optimizations.
e.g
int foo(int8_t x) {
x += 120
return x;
}
int bar(int8_t y) {
int z = foo(y);
if (y > 8) {
do_important_thing(z);
}
}
`do_important_thing` may be optimized out because:
1. signed overflow is a UB. Compiler than assumes that everything passed to foo is less than 8;
To be pedantic, C has no 8- or 16-bit addition operators, since everything sub-int is scaled up to int to do arithmetic. Therefore, the `x += 120;` line never overflows, since it is actually `x = (int8_t)((int)x + 120);`, and the possible range of `(int)x + 120` is comfortably within the range of expressible ints, while the conversion to int8_t is defined to wrap around when oversized. So there compiler can't optimize out do_important_thing in your example.
Instead of semantically correct Python, programmer and data scientists’ jobs will be to work in semantically correct English. Fundamentally the job won’t change (you’ll be programming the AI rather than program the machine directly).
> source code is really meant for humans to read. The canonical form of software is more like the object code produced by the compiler. You can probably just produce this directly
They key advantage of producing source code is that you can usually tell what the produced program does.
I think the validation phase of auto-coding fullblown apps is much more complex than AutoCode is ready for. When coding up a specific function, it's pretty easy to assess whether it maps input to output as intended. But composing functions into modules is much harder to validate, much less entire programs.
And specifying a full app to be autocoded is most certainly NOT a solved problem.
Until AutoCode can build an app that employs compound AND complex behaviors, like Angry Birds, or a browser, I'll continue to see it as little more than a write-only copy/paste/derive-driven macro generator.
Reading this I’m reminded of the debates around ORMs. At a basic level they drastically simplify your CRUD app. Until they make trivial errors no self-respecting programmer would (think N+1 queries), and then you need someone who actually understands what’s going on to fix it.
That doesn’t mean you shouldn’t ever use ORMs, or that in simple cases they aren’t “good enough”. But at some level of complexity it breaks down.
AI-assisted programming is the new leaky abstraction.
We're also in the very very early days of code generation models. Even I can see some ways to improve AlphaCode:
- the generate->cluster->test process feels like a form of manual feature engineering. This meta layer should be learned as well, possibly with RL
- programming is inherently compositional. Ideally it should perform the generate->cluster->test step for each function and hierarchically build up the whole program, instead of in a single step as it does now
- source code is really meant for humans to read. The canonical form of software is more like the object code produced by the compiler. You can probably just produce this directly