A benchmark must always perform the exact same computation for the same reason t...

oasisbob · on Oct 10, 2021

> all tape measures must be the exact same length.

Tape measures are useful even if they're not the same length.

Higher-end engineering tapes acknowledge that length isn't constant, and provide temperature coefficients, ranges of accuracy, and tensioning guidelines.

You can spend a lot of money on a tape measure which is more exact under more conditions. Most people don't need this.

hutrdvnj · on Oct 10, 2021

Maybe it would be possible to design the scientifical computational tasks so that they are very similar in terms of stress for the CPU, but not completely even. You could then state that the benchmark offers comparable results with a statistical error of something like 2% +-0.5. You can also average the error by running the benchmark multiple times before you compare the score with other devices.

kadoban · on Oct 10, 2021

By that point aren't you wasting as much computation time as you're trying to "save" by making it useful.

hutrdvnj · on Oct 10, 2021

I don't think so, because every single benchmark/computation would be scientifically useful, even if you run it multiple times.

jsnell · on Oct 10, 2021

Is Mersenne prime verification really scientifically useful? At this point, getting one more number verified seems about as useful as stamp collecting.

mattashii · on Oct 10, 2021

Mersenne prime verification is useful for the science field of mathematics, where the results can be used to (in)validate theories and predictions of (Mersenne) Prime Numbers. Sure, it is a niche with little contact area with the rest of thr world, but so were Elliptic Curves (used in cryptography), Set Theory and Relational Algebra (both heavily used in relational DB technology).

hutrdvnj · on Oct 10, 2021

I don't see why we should be limited to Mersenne prime verification, there are many other scientific computational projects in various fields that could benefit from it, astrophysics, molecular biology, genetics, chemistry, climate study, cancer research, ...

Wikipedia lists a few https://en.wikipedia.org/wiki/List_of_distributed_computing_...

y7 · on Oct 10, 2021

You could run the exact same computation and still do something useful. The inputs do not have to be identical, but they can be coordinated (or random). Think of running a brute force search on some useful problem.

jeroenhd · on Oct 10, 2021

Inputs definitely need to be identical because different inputs may lead to different behaviours in branch predictors and memory access patterns, affecting the score.

The impact may be small, but I see no reason why the impact should be there in the first place just to satisfy some mathematicians' curiosity about special numbers.

y7 · on Oct 12, 2021

Sorry, I meant running different inputs on a program that does not branch or have an input-dependent memory access pattern. Any computation can be written in such a form, although there might be a large overhead compared to programs that do branch.

Jeff_Brown · on Oct 10, 2021

There are certain problems for which different inputs do not require different amounts of computer. "Add one to a number between 128 and 256" is an obvious example. The question is whether there are useful problems with that property.

jeroenhd · on Oct 10, 2021

Even such a task is subject to very specific requirements, because adding a number between 128 and 256 may be enough to take another path in the microcode depending on if the result overflows or not. I'm not saying this happens kn practice, but I wouldn't be surprised if a future generation of processor would do this and make the entire benchmark invalid from that point on.

A more likely scenario for such independent instructions would operate on an entire bit string, like boolean operators and vector instructions. I think you'd have a tough time producing any useful output from such an algorithm, though, because you wouldn't be able to do much with conditionals to keep the branch predictor score fair.

I don't think that there are any algorithms that could operate within a generic benchmark that could have random elements in them _and_ product a useful result. Either the calculations are different and fair but meaningless, or they're exactly the same with the same result.

dTal · on Oct 10, 2021

> I don't think that there are any algorithms that could operate within a generic benchmark that could have random elements in them _and_ product a useful result.

One that springs to mind is Monte-Carlo sampled raytracing. An individual ray might take more or less time to compute, but the time to compute 10 million rays will be statistically be roughly constant. You could even imagine averaging a bunch of renders of the same scene from different machines to get a lower-noise result, thereby demonstrating a benchmark combined with useful work.

Statistical predictability is the key.

(Confession - this isn't exactly theoretical. I sometimes have occasion to render light fields, and I shard the work over as many random workstations as I can get my hands on. It's always obvious which workstations are faster than others, even without making any special attempt to balance the workload. I think this is actually a workable concept.)

a1369209993 · on Oct 11, 2021

We already have a existence proof for useful tasks for which changing inputs do not invalidate benchmark results - namely cryptography, where algorithms must run in constant time to avoid timing side-channel attacks. If your hardware takes a non-constant amount of time to add 8-bit integers, your hardware is broken and should receive a benchmark score of zero. The interesting question is how much overhead would be involved in turning scientific tasks like Mersenne prime search into benchmark-friendly workloads.

Jeff_Brown · on Oct 10, 2021

Wow. That makes it sound like even a perfectly deterministic calculation cannot be compared across machines. Some will be good at some, some good at others, and whether one machine is overall better than another depends on their intended uses.

Which now that I think about it, of course it would be like that. But still, what a headache for someone who just wants The Best One.

david-gpu · on Oct 10, 2021

Even a simple addition of two integers can take a different amount of energy to perform depending on the values involved. This in turn affects the temperature of the chip, which can cause thermal throttling. Computer architecture is more complex than most software engineers realize.