Hacker News new | past | comments | ask | show | jobs | submit | Kholin's comments login

You must use math questions that have never entered the training data set for testing to know whether LLM has real reasoning capabilities. https://venturebeat.com/ai/ais-math-problem-frontiermath-ben...


I tried asking an Electrostatics problem which I assume is not very interesting training data for such CS/Maths biased LLM. It's still going....

I like the tentativeness, I see a lot of : wait, But, perhaps, maybe, This is getting too messy, this is confusing, that can't be right, this is getting too tricky for me right now, this is very difficult.

I kind of find it harder to not anthropomorphise when comparing with ChatGPT. It feels like it's trying to solve it from first principles but with the depth of Highschool Physics knowledge.


Of course. I make up my own test problems, but it is likely that the questions and problems that I make up are not totally unique, that is, probably similar to what is in training data. I usually test new models with word problems and programming problems.


For me, Firefox performs better on Linux than on Windows.


Until one needs to watch YouTube videos.


This is true of a lot of things.


For example, Mozilla compiles Firefox’s official macOS and Windows builds on Linux because cross-compilation is faster and cheaper than compiling on macOS or Windows in the cloud.


Windows does not fork processes as fast as Linux, although this is improved somewhat in the in the Windows Subsystem for Linux.

Here is a somewhat foolish test with a shell script, that forks "/bin/true" 10 million times.

  C:\>busybox sh
  ~ $ echo 'x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done' > timetest
  ~ $ time sh timetest
  real    0m 46.86s
  user    0m 46.76s
  sys     0m 0.04s
Here is the same test with Debian's dash shell, one of the fastest:

  $ cat timetest
  x=10000000; while [ $x -gt 0 ]; do true; x=$((x-1)); done
  $ time dash timetest
  0m33.79s real     0m33.50s user     0m0.05s system
Not a great test, but there is quite a difference there.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: