People on here were mocking me openly when I pointed out that you can't be sure LLMs (or any AIs) are actually smart unless you CAN PROVE that the question you're asking isn't in the training set (or adjacent like in this case).
So with this in mind now, let me repeat: Unless you know that the question AND/OR answer are not in the training set or adjacent, do not claim that the AI or similar black box is smart.
I ran a test yesterday on ChatGPT and co-pilot asking first if it knew of a specific paper which it confirmed and then to derive simple results from which it was completely incapable of. I know this paper is not widely referenced (ie few known results in the public domain) but has been available for over 15 years with publicly accessible code written by humans. The training set was so sparse it had no ability to "understand" or even regurgitate past the summary text which it listed almost verbatim.
So with this in mind now, let me repeat: Unless you know that the question AND/OR answer are not in the training set or adjacent, do not claim that the AI or similar black box is smart.