I tried asking an Electrostatics problem which I assume is not very interesting training data for such CS/Maths biased LLM. It's still going....
I like the tentativeness, I see a lot of : wait, But, perhaps, maybe, This is getting too messy, this is confusing, that can't be right, this is getting too tricky for me right now, this is very difficult.
I kind of find it harder to not anthropomorphise when comparing with ChatGPT. It feels like it's trying to solve it from first principles but with the depth of Highschool Physics knowledge.
Of course. I make up my own test problems, but it is likely that the questions and problems that I make up are not totally unique, that is, probably similar to what is in training data. I usually test new models with word problems and programming problems.
For example, Mozilla compiles Firefox’s official macOS and Windows builds on Linux because cross-compilation is faster and cheaper than compiling on macOS or Windows in the cloud.
Windows does not fork processes as fast as Linux, although this is improved somewhat in the in the Windows Subsystem for Linux.
Here is a somewhat foolish test with a shell script, that forks "/bin/true" 10 million times.
C:\>busybox sh
~ $ echo 'x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done' > timetest
~ $ time sh timetest
real 0m 46.86s
user 0m 46.76s
sys 0m 0.04s
Here is the same test with Debian's dash shell, one of the fastest:
$ cat timetest
x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done
$ time dash timetest
0m33.79s real 0m33.50s user 0m0.05s system
Not a great test, but there is quite a difference there.