Lots of progress, but I feel like we've been seeing diminishing returns. I can't...

TeMPOraL · 2024-10-29T18:59:13 1730228353

You're proving GP's point about normalization of progress. It's been two years. We're still during the first iteration of applications of this new tech, advancements didn't have time yet to start compounding. This is barely getting started.

KingMob · 2024-10-31T05:37:31 1730353051

Neither of your points are proven. Is the slowdown a real effect of hitting technological/technique/data limits? Or is it just a lull in the storm?

ipsum2 · 2024-10-29T18:52:24 1730227944

I don't know about you, but o1-preview/o1-mini has been able to solve many moderately challenging programming tasks that would've taken me 30 mins to an hour. No other models earlier could've done that.

ffujdefvjg · 2024-10-29T19:04:25 1730228665

It's an improvement but...I've asked it to do some really simple tasks and it'll occasionally do them in the most roundabout way you could imagine. Like, let's source a bash file that creates and reads a state file to do something for which the functionality was already built-in. Say I'm a little skeptical of this solution and plug it into a new o1-preview prompt to double check the solution, and it starts by critiquing the bash script and error handling instead of seeing that the functionality is baked in and it's plainly documented. Other errors have been more subtle.

When it works, it's pretty good, and sometimes great. But when failure modes look like the above I'm very wary of accepting its output.

warkdarrior · 2024-10-29T22:35:34 1730241334

> I've asked it to do some really simple tasks and it'll occasionally do them in the most roundabout way you could imagine.

But it still does the tasks you asked for, so that's the part that really matters.