We need not go so deep into semantics, if it is real understanding or not. It so...

YeGoblynQueenne · on July 21, 2020

But what does it mean, if it's beating an irrelevant benchmark for a task that is poorly defined? Is it really amazing if it's passing a test that doesn't mean anything at all, just because it wasn't trained to pass that test? So it happens to pass the test. So what? What did we learn from that?

I believe such a situation would generate much less debate in software engineering: "my program passes all my unit tests, but it still crashes". Well, yes. Your program passes all your unit tests, because your unit tests are missing the point, not because your code works.