Now try this with hyperthreading disabled since it's disabled on the arm machine.
If you have a 24 core machine and you run the same task on 48 threads, you will sometimes see some performance drop compared to running it on 24 threads.
HT doesn't change this. Some loads are not ht friendly and you should account for this when deciding on how many threads to spawn.
They limited it to 64 cores to be fair to the arm processor but did not limit it to 18/24 cores to be fair to the intel processor. Why is that?
The last graph is the most enlightening if you keep in mind the number of real cpus each processor has.
CPU architects make lots of different choices and tradeoffs - in this case Intel have chosen hyperthreaded physically larger CPUs, while the ARM chip's designers have decided that hyperthreading brings no meaningful improvement but lets them build smaller (and maybe faster) CPUs, and as a result more CPUs per die.
These are tradoffs that we make all the time when building CPUs - none are 'best' it's more that a group of changes is better for a particular case.
So the people benchmarking here have NOT turned off hyperthreading on the ARM chips, there is none to turn off, instead there are more CPUs - the ARM guys have optimised their 64 core chip to be useful in their target market which happens to be closer to what these guy's application does
Running 64 threads on the intel cpus is slowing them down vs running the number of real cpus they have.
and as i said, since they limited the test to 64 threads even thou one of the cpus has more then 64 vcpus, ("to be fair to the arm processor") the moment they saw the final graph, they should have done the same thing in reverse, to be fair to the intel processor. Otherwise it just reeks of selective methodology application.
Of course, as you said, the real answer is they should not have limited the test to 64 threads. that doesn't match real workloads where the number of threads would be set to the number of cpus or vcpus.
Instead they should have done single threaded, tests with both processors maxed out at max(intel vcpu, arm vcpu) threads on both, as well as one where they set each to their respective max, as well as repeat with the real cpus number.
If you have a 24 core machine and you run the same task on 48 threads, you will sometimes see some performance drop compared to running it on 24 threads.
HT doesn't change this. Some loads are not ht friendly and you should account for this when deciding on how many threads to spawn.
They limited it to 64 cores to be fair to the arm processor but did not limit it to 18/24 cores to be fair to the intel processor. Why is that?
The last graph is the most enlightening if you keep in mind the number of real cpus each processor has.