Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We need not go so deep into semantics, if it is real understanding or not. It solves SuperGLUE and other 'reasoning' tasks without training on them, and yes, at a lower accuracy. The amazing part is that it can be prompted to various tasks like that.


But what does it mean, if it's beating an irrelevant benchmark for a task that is poorly defined? Is it really amazing if it's passing a test that doesn't mean anything at all, just because it wasn't trained to pass that test? So it happens to pass the test. So what? What did we learn from that?

I believe such a situation would generate much less debate in software engineering: "my program passes all my unit tests, but it still crashes". Well, yes. Your program passes all your unit tests, because your unit tests are missing the point, not because your code works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: