Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It is able to link ideas logically, defend them, adapt to the context, roleplay, and (especially the latest GPT-4) avoid contradicting itself.

Isn't this just responding to the context provided?

Like if I say "Write a Limerick about cats eating rats" isn't it just generating words that will come after that context, and correctly guessing that they'll rhyme in a certain way?

It's really cool that it can generate coherent responses, but it feels icky when people start interrogating it about things it got wrong. Aren't you just providing more context tokens for it?

Certainly that model seems to fit both the things it gets right, and the things it gets wrong. It's effectively "hallucinating" everything but sometimes that hallucination corresponds with what we consider appropriate and sometimes it doesn't.



It's all about emergent complexity. While you can reduce it to "just" statistical auto-completion of the next word, we are seeing evidence of abstraction and reasoning produced as a higher-order effect of these simple completions.

It's a bit like the Sagan quote: "If you wish to make an apple pie from scratch, you must first invent the universe".

Sometimes for GPT to "just" complete the next word in a way that humans find plausible, it must, along the way, develop a model of the world, theory of mind, abstract reasoning, etc. Because the models are opaque, we can't yet point to a certain batch of CPU cycles and say "there! it just engaged in abstract reasoning". But we can see from the output that to some extent it's happening, somehow.

We also see effects like this when looking at collective intelligence of bees and ants. While each individual insect is only performing simple actions with extremely limited cognitive processing, it can add up to highly complex and intelligent/adaptive mechanics at the level of the swarm. There are many phenomena like this in nature.


> Sometimes for GPT to "just" complete the next word in a way that humans find plausible, it must, along the way, develop a model of the world, theory of mind, abstract reasoning. etc.

I did an experiment recently where I asked ChatGPT to "tell me an idea [you] have never heard before". ChatGPT replied with what sounded like an idea for a startup, which was delivering farm-fresh vegetables to customers' doors. This is of course not an idea it has never heard before, it's on the internet.

If you asked a human this, they would give you an idea they had never heard before, whereas ChatGPT simply "finds" training data where someone asked a similar question, and produces the likely response, which is an idea that it has actually "heard," or seen in its training data, before. (Obviously a gross simplification of the algorithm but the point stands.)

This is a difference between ChatGPT's algorithm and human reasoning. The things that you mention, the model of the world, theory of mind, etc. are statistical illusions which have observable differences from the real thing.

Am I wrong? I'm open to persuasion.


I think it's certainly fair to say that GPT's "reasoning" is different from human reasoning. But I think the core debate we're having is whether the difference really matters in some situations.

Certainly, Midjourney's "creativity" is different from human creativity. But it is producing results that we marvel at. It's creative not because it's doing the exact same philosophical thing humans do, but because it can produce the same effect.

And I think many situations are like that. We can always say that human creativity/reasoning/x will always be different from artificial reasoning. But even today, GPT's statistical model replicates many aspects of human reasoning virtually. Is that really an illusion (implying its fake and potentially useless), or is it just a different way of achieving a similar result?

Plus, different models will excel at different thing. GPT's model will excel at synthesizing answers from far more information than a single human will ever be able to know. Does it really matter if it's not identical to human reasoning on a philosophical or biological level, if it can do things humans can't do?

At the end of the day, some of these discussions feel like bike shedding about what words like "reasoning" mean philosophically. But what will ultimately matter is how well these models perform at real world tasks, and what impact that will have on humanity. It doesn't really matter if it's virtualized reasoning or "real" human reasoning at that point.


Most arguments that AI can't really reason/think/invent essentially reduce to defining these terms as things only humans can do. Even if you had an LLM-based AGI that passes the Turing test 100% of the time, cures cancer, unites quantum physics with relativity, and so on, many of the people who say that ChatGPT can't reason will keep saying the same thing about the AGI.


I don't think there's anything wrong with people trying to see what, if anything, differentiates ChatGPT from humans. Curing cancer etc. is useful, as is ChatGPT, regardless of how it achieves these results. But how it achieves them is important to many people, including myself. If it's no different from humans, then we need to treat it like a human---well no, strike that, we need to treat it _well_ and protect it and give it rights and so on. If it's a fancy calculator, then we don't.


I don't think there's anything wrong with it either. It's an important debate. I just think the arguments usually become very circular and repetitive. If there's nothing an AI could ever do to convince you that it's thinking or reasoning, then really you should be explicit and say "I don't believe an AI can produce human thought or human reasoning" or "an AI is not a human" and nobody will disagree with you on those points.


> and nobody will disagree with you on those points

But that's the point, they do. Even on HN there are many comments saying that humans are just fancy autocomplete, i.e. there's no fundamental difference between humans and LLMs.


tines says>"Even on HN there are many comments saying that humans are just fancy autocomplete, i.e. there's no fundamental difference between humans and LLMs."<

LLMs'may prove a useful analogy as to how parts of human intelligence operate, an analogy that, at the very least, should be thoroughly researched.


"there's no fundamental difference between humans and LLMs."

I think that's a straw man. No one disagrees that humans and LLMs produce cognition differently. One uses a wet, squishy brain. The other uses silicon chips. There's no disagreement here.


> One uses a wet, squishy brain. The other uses silicon chips.

Well then, that settles the debate!


My point is that's not a debate anyone is having. No one claims that ChatGPT is human! The claim is merely that ChatGPT is engaging in (non-human) forms of reasoning, abstraction, creativity, and so on, with varying levels of ability.

There's a separate debate on whether the brain produces human thoughts in a similar way to ChatGPT's non-human thought. The question here is whether brains are essentially biological LLMs, and whether GPT's current limitations relative to humans could be overcome simply by scaling up the number of GPT's parameters to match or exceed the number of neurons in the human brain. But whether or not that turns out to be the case, it would not mean that AIs are the same as humans, or use exactly the same processes to think and reason, and no one is claiming that.


The word "thought" means something. When you use it to describe ChatGPT, you have in fact argued "there's no fundamental difference between humans and LLMs."


The parent was very careful to distinguish "human thought" from "non-human thought".


> The parent was very careful to distinguish "human thought" from "non-human thought".

Yes, I noticed. Putting "non-human" in front of "thought" doesn't help.

I doubt parent uses the word "thought" to describe how a thermostat, calculator, or "Hello world" program works.

Using it to describe ChatGPT has no discernable semantic meaning other than OP believes ChatGPT works like an animal brain.


That presupposes that the only thought that exists or can exist is human thought. You can define it that way if you like, but it’s not the only definition.


I'm not saying the only thought that exists is human thought. (I believe animals can think).

I'm saying using a word invented to describe animals behavior, "thought" to describe a large language model has no discernible meaning other than you think it works like an animal brain.

If you think it's an open question whether it works like an animal, you should find a better word than "thought".


A CPU "runs". A disk "seeks". An OS stores data in "memory". Re-purposing terms to describe new concepts is routine in the evolution of language, and (non-human, non-biological) "thought" is a perfectly apt way to describe what we can observe in the output of massive LLMs like GPT.


A CPU "runs" but nobody debates whether a CPU has legs and leaps around.

A OS stores "memories" but nobody argues whether the OS fondly recalls the day their son was born.

When people are debating how the LLM works and you call what the LLM does "thought" it's perfectly reasonable to conclude you believe the human brain works essentially the same as an LLM.

I just asked ChatGPT 4.0 and here's my question and the response, in the interest of full disclosure I edited out the part that I thought wasn't relevant so as not to bore:

"...if I call your processes "thoughts" is that the correct term? If not what term should I use?"

"If you want to describe the processes of an AI language model like me, it's more accurate to use terms such as "information processing," "pattern recognition," or "data-driven predictions." While these terms don't fully capture the complexity of the underlying algorithms, they are more appropriate than "thoughts," which implies consciousness and self-awareness that AI models like me do not possess.

Using the term "thoughts" might be misleading, as it could imply that the AI has a subjective experience or a sense of understanding, which is not the case. Instead, I rely on advanced algorithms to generate responses based on the patterns and associations learned from the data during my training."

So ChatGPT doesn't state you used the correct term.

I genuinely wonder if you think ChatGPT is consciouss and self-aware and you used a word that implied that intentionally, or if you just like how the word "thought" sounds and are indifferent to what people think you are implying.


I don’t think it’s conscious. Though at some point in the future, it will likely be hard to say that with total certainty.

Perhaps you’re right that the term thought has too much baggage. I’m just saying that if you look at it at naively, it’s engaging in forms of abstractions, reasoning, world modeling, invention, and so on that seem a lot like “thought”. If a human told you they were doing those things, you’d say they were thinking, right?

I agree it’s not thought in exactly the way that we are used to using the word, but I think it can be classified as a type of thought.


> It's creative not because it's doing the exact same philosophical thing humans do, but because it can produce the same effect.

Absolutely, and I hope none of my comments are taken in a way that disparages how amazing ChatGPT and Stable Diffusion et al. are. I'm just debating how humanlike they are.

> Is that really an illusion (implying its fake and potentially useless)

I don't think that because it's an illusion means that its useless. Magnets look like telekinesis, but that effect being an illusion doesn't mean that magnets are useless; far from it, and once we admit that they are what they are, they become even more useful.

> Plus, different models will excel at different thing. GPT's model will excel at synthesizing answers from far more information than a single human will ever be able to know. Does it really matter if it's not identical to human reasoning on a philosophical or biological level, if it can do things humans can't do?

It only matters if people are trying to say that ChatGPT is essentially human, that idea is all I was replying to. I completely agree with you here.


If it can reason, should it be held accountable for the consequences of its mistakes ?

A simple tool can’t. A « mind » that is coming in our world should, right ?

Just like all the marvel and DC where super-human are still accountable for their mistakes, their super powers are no excuse.


Almost all people almost never have truly original ideas. When asked to "tell me an idea [you] have never heard before", they will remix stuff they have heard to get something that "feels" like it's new. In some cases they'll actually be wrong and reproduce something they heard and forgot about hearing, but remember the concept. Most of the time, the remix will be fairly superficial.

And remixing stuff it has heard before is exactly what ChatGPT is doing. What it sucks at is the "feels like it's new" part, but fundamentally it would be quite easily capable of creating output that combines concepts with maximally negative correlation, the only thing that's truly missing is the ability to interpret the prompt as an instruction to do that.


Certainly. I mean we've seen all 26 letters before-- ChatGPT is just remixing them.

How does one actually measure novelty, without having to know everything first?


The entire strength of large language models like GPT is that they do know a frighteningly good approximation of everything, in terms of having been trained on text written about it.


> And remixing stuff it has heard before is exactly what ChatGPT is doing.

Check out my "the confetti has left the cannon" example above.

https://news.ycombinator.com/item?id=35346683

Maybe still "remixing", but it sure feels like new to me.


For what it's worth, I asked ChatGPT to come up with an original idea but I generated 25 random English words and told it to use them as inspiration. This can help the output be more creative and original. Here's what it came up with:

> The BerryPulse is an innovative, eco-friendly device that captures the energy released during the natural decomposition of berries to produce heat and electricity. The device comprises a closed container, where a cluster of berries is placed in a specially designed compartment, acting as an organic fuel source. The decomposition process is intensified using a biodegradable catalyst, which accelerates the natural breakdown of the berries while ensuring minimal environmental impact.


> the model of the world, theory of mind, etc. are statistical illusions which have observable differences from the real thing.

Is that a falsifiable claim?

What would an empirical test look like that would determine whether the claim ("GPT-4 does not have a model of the world") is correct or not?


I guess ChatGPT having a model of the world is not something I'd argue against, my mistake. I do think the data it contains represents a quite nuanced model of the world. But I don't think this is a very strong claim. A geographical map has a model of the world too.


Thanks. To get to what I think is the core of your argument (?)

> ChatGPT simply "finds" training data where someone asked a similar question, and produces the likely response, which is an idea that it has actually "heard," or seen in its training data, before.

I can definitely see a scenario where we manage to build an ultra-intelligent machine that can figure out any logical puzzle we put to it, but where it still isn't "creative" in any way.

But GPT-4 seems to be not just intelligent, but also creative.

Have you seen the "confetti has left the cannon" example?

  For the curious, here was the conversation I had:
  ME: What's another saying similar to: "The cat is out of the bag" or "The genie is out of the bottle"?
  
  chatgpt: Another similar saying is "the toothpaste is out of the tube." Like the other two expressions, this phrase conveys the idea that something has been revealed or unleashed, and it cannot be undone or put back into its original state.
  
  ME: Can you invent a new phrase that has the same meaning, which is completely new?
  
  chatgpt: Certainly! Here's a new phrase with a similar meaning: "The confetti has left the cannon." This phrase suggests that once something is out in the open or has happened, it's impossible to reverse or contain it, much like confetti that has been shot out of a cannon during a celebration.
https://news.ycombinator.com/item?id=35346683


That does seem really impressive. But don't you think that it's pretty likely that this, or something phrased slightly differently, appeared in the training data?


> But don't you think that it's pretty likely that this, or something phrased slightly differently, appeared in the training data?

I don't think so, but I could be wrong. It's definitely not "likely", see the math below.

I base that on the fact that people seemed to spend quite a bit of time trying to find the phrase "the confetti has left the cannon" that GPT-4 phrased. It seems Google search has no records of it before then?

I've seen many other examples where GPT-4 can translate sentences between using different types of idioms, and I just can't picture all these weird examples already being present on the Internet?

Do you think GPT-4 is a stochastic parrot that just has a large database of responses?

If so, how would we test that claim? What logical and reasoning problems can we give it where it fails to answer, but a human doesn't?

My understanding is that even with an extremely limited vocabulary of 32 words, you quickly run out of atoms in the universe (10^80) if you string more than 50 words together. If your vocabulary instead is 10k words, you reach 10^80 combinations after 20 words.

By training the LLMs on "fill in the missing word", they were forced to evolve ever more sophisticated algorithms.

If you look at the performance over the last 5 years of increasingly larger LLMs, there was a hockey-stick jump in performance 1-2 years ago. My hunch is that is when they started evolving structures to generate better responses by using logic and reasoning instead of lookup tables.


> I base that on the fact that people seemed to spend quite a bit of time trying to find the phrase "the confetti has left the cannon" that GPT-4 phrased. It seems Google search has no records of it before then?

Could it be that the expression in some form has been used in languages other than English?


Good point! I hadn't thought of that.

If that is the case, it would downgrade the achievement from "super impressive" to just "impressive".

I spent some time trying to find it in other languages, but couldn't. Doesn't prove much of course, hopefully native speakers can weigh in on this.

I did find this though:

'TIL that a young stripper named Shelly Bauman lost her leg in a freak confetti cannon accident. She sued and used the money from the settlement to open Seattle's first gay bar, which she named "Shelly's Leg."'


One interesting way I heard to around this is by mixing human languages in the prompt which probably never appear together in any training data, and seeing that chat gpt can do still output sensible replies. That seems to imply that something unique is happening beyond token lookup, if it’s taking different languages and mapping that to the underlying information, that looks a lot more like what people call “understanding”.


Turns out good usage of "language" requires a model of the world in which that language exists. "The purple, two eyed, green, five eyed, invisible frog said moo" is a grammatically fine sentence. But logically it makes no sense, does it have two eyes or five? Is it green or purple or invisible? Frogs don't typically say moo. To have actual coherent usage of language, you need a model of the world. Not just the world, but the current domain you're using language in. "The frog brainwashed the crowd with its psychic powers" is nonsense in a biology paper, but perfectly valid inside of the cartoon Futurama.

In ChatGPT the language-model and world-model are really just the same model, which makes a lot of sense.


Very well said. We think of a word as "just" a word, a simple, primarily textual thing, but it's actually a vertex on an extremely large and complex many-dimensional graph that includes connections related to meaning, logic/reasoning, knowledge about reality, emotional sentiment, and so on. The literal textual representation of the word--the letters it consists of--are just one property among many, and probably one of the least important to producing sensible output. GPT is discovering the shape of this super-graph and learning to navigate its connections.


This is really lofty language without much evidence to back it up. It fluffs up techie people and makes them feel powerful, but it doesn't really describe large language models nor does it describe linguistic processes.


The evidence is ChatGPT's output. Unless you're saying that passing the bar exam, writing working code, etc. doesn't require abstract reasoning abilities or a model of the world?


It's a large language model. It is fed training data. It is not that impressive when it spits out stuff that looks like its training data. You are the one asserting things without evidence.


It can pass tests and exams with answers that were not included in its training corpus. For example, it passed the 2023 unified bar exam, though its training cut off in 2021. Yes, it can look at previous test questions and answers, just like human law students can. Are you therefore claiming that human law students don't engage in abstract reasoning when they take the bar exam, since they studied with tests from previous years?

It can also write code for novel use cases that have never been done before. I gave it a task like this a few days ago and it got it right on the first try. There are literally millions of empirical data points that contradict you.


It is a large language model. It manipulates text based on context and the imprint of its vast training. You are not able to articulate a theory of reasoning. You are just pointing to the output of an algorithm and saying "this must mean something!" There isn't even a working model of reasoning here, it's just a human being impressed that a tool for manipulating symbols is able to manipulate symbols after training it to manipulate symbols in the specific way that you want symbols manipulated. Where is your articulated theory of abstract reasoning?


ttpphd says >"Where is your articulated theory of abstract reasoning?"<

If he had a complete answer to your questions then he would keep his mouth shut and go directly to META and collect $2 BN USD or get a Nobel prize (or both). What you seem to want is a peer-reviewed academic paper but what we're doing here is brainstorming about what is going on in these LLMs.

He's definitely onto something here: LLM models, at the very least, appear to generate reasonable human-like statements about human concepts. ChatGPT et al are useful in the same way a human assistant is useful. Most remarkably, they appear to think like we do. We need to understand how these MOFOs work b/c in a few years they're going to be everywhere.

IIRC an old "Far Side" Gary Larson cartoon depicts two bears just outside their cave, arrows in their limbs and butts, fighting off a hungry bunch of cave men. One bear says to the other "Seems there's more and more of these every year!"

Well, unless we're careful, next time we're going to be the bears!


I don't like buying into hype mindlessly. I prefer to reason through things and apply skepticism. If people are gonna claim that a chatbot has gained sentience, I'm gonna have some tough questions.


Note I didn't say "sentience" anywhere. There's a huge difference between non-human reasoning/thinking and sentience/consciousness. I don't believe the first implies the latter... it's necessary but not at all sufficient.


It's not clear to me what point you're trying to make. Why do we need an "articulated theory of abstract reasoning" to say that passing the bar exam or writing code for novel, nontrivial tasks requires reasoning? Seems rather obvious.


You are making a claim that there is some attribute of importance. For that claim to be persuasive, it should be supported with an explanation of what that attribute is and is not, and evidence for or against the meeting of those criteria. So far all you have done is say "Look at the text it puts out, isn't that something?"

It's just empty excitement, not a well-reasoned argument.


You keep avoiding this question: does passing the bar exam and writing code for novel, nontrivial tasks require reasoning or doesn't it?

You aren't answering because saying no will sound ridiculous. We all know it requires reasoning.

As for an "attribute of importance", I guess that's subjective, but I've used ChatGPT to write code in a few minutes that would have taken me hours of research and implementation. I've shipped that code to thousands of people. That's enough for it to be important to me, even ignoring other applications, but you certainly have the right to remain unimpressed if you so choose.


For a human, it takes human reasoning. But a xerox machine can also output the correct answers given the right inputs, which is exactly what you can say about an LLM.

The "attribute of importance" I'm referring to is "rationality". You keep talking about it like it means something but you can't define it beyond "I'm pretty sure this text was made using it".

Does a tape recording of a bird song "know" how to sing like a bird?


Those aren't good analogies. An LLM isn't like a xerox machine or a tape recorder. Again, the answers to the bar exam it passed weren't in its training data. Nor was the code it wrote for me.

I'm using the common, colloquial definition of reasoning. I don't think we need an academic treatise to say that passing the bar exam (without copying the answers) or writing code for a novel task requires reasoning.

You're right that we don't fully understand how the LLM is doing this, but that doesn't mean it isn't happening.


Thank you, yes, for saying I am right in saying that the evidence is lacking, which was precisely my original point.


The evidence isn’t lacking :) We have lots of evidence. What we lack is a coherent theory that explains the evidence.


> Like if I say "Write a Limerick about cats eating rats" isn't it just generating words that will come after that context, and correctly guessing that they'll rhyme in a certain way?

Aren't you just doing that?


"Responding to the context provided" is very vague. I could argue that I'm doing exactly that right now as I'm writing this comment. It does not imply not being able to e.g. link ideas logically.

With respect to interrogating GPT if it does something wrong - the reason why people do it is because it works. With GPT-4 especially, you can often ask it to analyze its own response for correctness, and it will find the errors without you explicitly pointing them out. You can even ask it to write a new prompt for itself that would minimize the probability of such errors in the future.


There once was a Cat in New York

Who got caught for feeding some Rats ; Tremendous Work!

All the people tell me, many men, biggly men - many with tears in their eyes...

That I have done nothing legally-wise

But the truth is ; I am an enormous dork.

>>_Created by an actual Human Being with actual DNA for crime scene evidence._

-

But just when they tried to brush under a rug

To try to make the folks 'shrug'

Is the Streisand Effect as a scar

As everyone knows of payments to a Porn Star

And the nation will know youre a simple thug.


There once was a man in New York

Guilty of paying too much for pork

He thought he would never stand

on a trial from the local grand

but corruption was just part of the work.


>Like if I say "Write a Limerick about cats eating rats" isn't it just generating words that will come after that context, and correctly guessing that they'll rhyme in a certain way?

I guess ... this is what confuses me. GPT -- at least, the core functionality of GPT-based products as presented to the end user -- can't just be a language model, can it? There must be vanishingly view examples from its training text that start as "Write a Limerick", followed immediately by some limerick -- most such poems do not appear in that context at all! If it were just "generating some text that's likely to come after that in the training set", you'd probably see some continuations that look more like advice for writing Limericks.

And the training text definitely doesn't have stuff like, "As a language model, I can't provide opinions on religion" that coincides precisely with the things OpenAI doesn't want its current product version to output.

Now, you might say, "okay okay sure, they reach in and tweak it to have special logic for cases like that, but it's mostly Just A Language Model". But I don't quite buy that either -- there must be something outside the language model that is doing significant work in e.g. connecting commands with "text that is following those commands", and that seems like non-trivial work in itself, not reasonably classified as a language model.[2]

If my point isn't clear, here is the analogous point in a different context: often someone will build an AND gate out of pneumatic tubes and say, "look, I made a pneumatic computer, isn't that so trippy? This is what a computer is doing, just with electronics instead! Golly gee, it's so impressive what compressed air is [what LLMs are] capable of!"

Well, no. That thing might count as an ALU[1] (a very limited one), but if you want to get the core, impressive functionality of the things-we-call-computers, you have to include a bunch of other, nontrivial, orthogonal functionality, like a) the ability read and execute a lot of such instructions, and b) to read/write from some persistent state (memory), and c) have that state reliably interact with external systems. Logic gates (d) are just one piece of that!

It seems GPT-based software is likewise solving other major problems, with LLMs just one piece, just like logic gates are just one piece of what a computer is doing.

Now, if we lived in a world where a), b), and c) were well-solved problems to point of triviality, but d) were a frustratingly difficult problem that people tried and failed at for years, then I would feel comfortable saying, "wow, look at the power of logic gates!" because their solution was the one thing holding up functional computers. But I don't think we're in that world with respect to LLMs and "the other core functionality they're implementing".

[1] https://en.wikipedia.org/wiki/Arithmetic_logic_unit?useskin=...

[2] For example, the chaining together of calls to external services for specific types of information.


I think you're really undervaluing the capabilities of language models. I would put an AND gate and this language model at opposite ends in terms of complexity. It is not just words, it's a very broad and deep hierarchy of learned all-encompassing concepts. That's what gives it its power.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: