Both CPUs are same microarch, Haswell. Xeon has much more cache. The i5 has higher base frequency (3.2 vs 2.6) the Xeon however has higher turbo frequency (3.6 vs 3.4).
OK, I’ve installed Cygwin and GCC, compiled and benchmarked the original code. I made the following changes in the makefile: (1) Replaced -O2 with -O3 (2) added -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 to the option, both C and C++.
The results on GCC/i5/Windows 10 are very consistent with the OPs result on GCC/Xeon/Linux.
OK, I’ve installed Cygwin and GCC, compiled and benchmarked the original code. I made the following changes in the makefile: (1) Replaced -O2 with -O3 (2) added -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 to the option, both C and C++.
The results on GCC/i5/Windows 10 are very consistent with the OPs result on GCC/Xeon/Linux.
Diagram: https://raw.githubusercontent.com/Const-me/matmul/master/res...
Numbers: https://github.com/Const-me/matmul/blob/master/Run/result.xl...