Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.
I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.
I look forward to seeing how it handles upcoming problems!
I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.
For the more Java proficient, can someone explain why it may have provided this code:
for (int[] current = queue.remove(0)) {
which was a compilation error for me? The corrected code it gave me afterwards was just
for (int[] current : queue) {
and with that one change the class ran and gave the right solution.
I use a Claude and Gemini a lot for coding and I realized there is no good or best model. Every model has it's upside and downside. I was trying to get authentication working according to the newer guidelines of Manifest V3 for browser extensions and every model is terrible. It is one use case where there is not much information or right documentation so every model makesup stuff. But this is my experience and I don't speak for everyone.
Relatedly, I start to think more and more the AI is great for mediocre stuff. If you just need to do the 1000th website, it can do that. Do you want to build a new framework? Then there will probably be less many useful suggestions. (Still not useless though. I do like it a lot for refactoring while building xrcf.)
EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.
That's when having a huge context is valuable. Dump all of the new documentation into the model along with your query and the chances of success hugely increase.
This is true for all newish code bases. You need to provide the context it needs to get the problem right. It has been my experience that one or two examples with new functions or new requirements will suffice for a correction.
I can't comment on why the model gave you that code, but I can tell you why it was not correct.
`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:
```
for (int[] current : queue) {
for (int c : current) {
// ...do stuff...
}
}
```
Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.
Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.
I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.
I look forward to seeing how it handles upcoming problems!
I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.
For the more Java proficient, can someone explain why it may have provided this code:
which was a compilation error for me? The corrected code it gave me afterwards was just and with that one change the class ran and gave the right solution.