While true, I think this is still valid criticism considering so many people are quick to jump on the "AGI" bandwagon when discussing the current generation of LLMs.
No ones thinking a 7b-70b LLM is going to be an AGI lol, a 700b-1T llm likely gets pretty damn close especially with some of the newer attention concepts.
And yet GPT-4 with 1-2 trillion parameters still fails at the most basic math, sometimes even for tasks like adding up a set of ten numbers (hence the Wolfram comment). That's as clear evidence as any that intelligence is more than just language proficiency.
Prepare now. You're going to see it a lot more until there's a general understanding of how these things work. I think it's going to be a while. Even here on HN, I don't think most people understand. I know I don't.
I've seen school kids get LLMs to do their homework assignments for them. They don't care about them not doing math well. They just use them for what they're good for. Then they use other apps to paraphrase/simplify the writing so it looks more natural.
The people who actually have things to do will just get them done.
Don't spout off about something you're not knowledgeable about? Not trying to be rude it just seems like if you don't know how they work you shouldn't be declaring something a failure because of a poorly conceived test.
I said not convinced, as in my user experince was/is still not conductive to the hype I perceive. The prompt says 'Ask anything', how should one know basic arithmetic is holding it wrong? Btw it also failed all queries in my native language after assuring (in English) it can parse them. Just nonsense/gibberish. Guess this is also a well known limitation? Perhaps my expectations were too high.
Yes, but those of us who want to use an AI are waiting for somebody to hook up a calculator on the back end. We would like the AI to test its theories before it send them back to us.
We know that LLMs are bad at math. It's a fundamental limitation of a neural network that thinks in words, and not in numbers.
ChatGPT offers the Wolfram plugin to work around this issue, but it's not a bug, or a fault, it's just how LLMs work.