People asking LLMs math puzzles and then thinking a wrong answer is some kind of...

paxys · on Dec 19, 2023

While true, I think this is still valid criticism considering so many people are quick to jump on the "AGI" bandwagon when discussing the current generation of LLMs.

cchance · on Dec 19, 2023

No ones thinking a 7b-70b LLM is going to be an AGI lol, a 700b-1T llm likely gets pretty damn close especially with some of the newer attention concepts.

paxys · on Dec 19, 2023

And yet GPT-4 with 1-2 trillion parameters still fails at the most basic math, sometimes even for tasks like adding up a set of ten numbers (hence the Wolfram comment). That's as clear evidence as any that intelligence is more than just language proficiency.

a_imho · on Dec 19, 2023

Tiresome for you perhaps, but this was my very first deliberate interaction with LLMs. Hardly a puzzle btw, more like basic arithmetic.

Workaccount2 · on Dec 19, 2023

No, it's tiresome for just about everyone paying attention to LLMs.

recursive · on Dec 19, 2023

Prepare now. You're going to see it a lot more until there's a general understanding of how these things work. I think it's going to be a while. Even here on HN, I don't think most people understand. I know I don't.

HKH2 · on Dec 20, 2023

I've seen school kids get LLMs to do their homework assignments for them. They don't care about them not doing math well. They just use them for what they're good for. Then they use other apps to paraphrase/simplify the writing so it looks more natural.

The people who actually have things to do will just get them done.

a_imho · on Dec 19, 2023

Ok, so what do you suggest, people paying only casual attention should know this somehow?

chankstein38 · on Dec 19, 2023

Don't spout off about something you're not knowledgeable about? Not trying to be rude it just seems like if you don't know how they work you shouldn't be declaring something a failure because of a poorly conceived test.

a_imho · on Dec 19, 2023

I said not convinced, as in my user experince was/is still not conductive to the hype I perceive. The prompt says 'Ask anything', how should one know basic arithmetic is holding it wrong? Btw it also failed all queries in my native language after assuring (in English) it can parse them. Just nonsense/gibberish. Guess this is also a well known limitation? Perhaps my expectations were too high.

recursive · on Dec 19, 2023

It's presented as a chat bot. How much should know about chats before we can conclude that the responses are nonsense?

jay_kyburz · on Dec 19, 2023

Yes, but those of us who want to use an AI are waiting for somebody to hook up a calculator on the back end. We would like the AI to test its theories before it send them back to us.