Kholin's comments

Kholin · 2024-11-28T12:12:03 1732795923

You must use math questions that have never entered the training data set for testing to know whether LLM has real reasoning capabilities. https://venturebeat.com/ai/ais-math-problem-frontiermath-ben...

DoingIsLearning · 2024-11-29T09:49:43 1732873783

I tried asking an Electrostatics problem which I assume is not very interesting training data for such CS/Maths biased LLM. It's still going....

I like the tentativeness, I see a lot of : wait, But, perhaps, maybe, This is getting too messy, this is confusing, that can't be right, this is getting too tricky for me right now, this is very difficult.

I kind of find it harder to not anthropomorphise when comparing with ChatGPT. It feels like it's trying to solve it from first principles but with the depth of Highschool Physics knowledge.

mark_l_watson · 2024-11-28T12:53:28 1732798408

Of course. I make up my own test problems, but it is likely that the questions and problems that I make up are not totally unique, that is, probably similar to what is in training data. I usually test new models with word problems and programming problems.

Kholin · on July 18, 2023

For me, Firefox performs better on Linux than on Windows.

pjmlp · on July 19, 2023

Until one needs to watch YouTube videos.

nateb2022 · on July 18, 2023

This is true of a lot of things.

cpeterso · on July 18, 2023

For example, Mozilla compiles Firefox’s official macOS and Windows builds on Linux because cross-compilation is faster and cheaper than compiling on macOS or Windows in the cloud.

chasil · on July 18, 2023

Windows does not fork processes as fast as Linux, although this is improved somewhat in the in the Windows Subsystem for Linux.

Here is a somewhat foolish test with a shell script, that forks "/bin/true" 10 million times.

  C:\>busybox sh
  ~ $ echo 'x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done' > timetest
  ~ $ time sh timetest
  real    0m 46.86s
  user    0m 46.76s
  sys     0m 0.04s

Here is the same test with Debian's dash shell, one of the fastest:

  $ cat timetest
  x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done
  $ time dash timetest
  0m33.79s real     0m33.50s user     0m0.05s system

Not a great test, but there is quite a difference there.