Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What I never understand is the population of coders that don’t see any value in coding agents or are aggressively against them, or people that deride LLMs as failing to be able to do X (or hallucinate etc) and are therefore useless and every thing is AI Slop, without recognizing that what we can do today is almost unrecognizeable from the world of 3 years ago.

I don't recognize that because it isn't true. I try the LLMs every now and then, and they still make the same stupid hallucinations that ChatGPT did on day 1. AI hype proponents love to make claims that the tech has improved a ton, but based on my experience trying to use it those claims are completely baseless.



> I try the LLMs every now and then, and they still make the same stupid hallucinations that ChatGPT did on day 1.

One of the tests I sometimes do of LLMs is a geometry puzzle:

  You're on the equator facing south. You move forward 10,000 km along the surface of the Earth. You are rotate 90° clockwise. You move another 10,000 km forward along the surface of the earth. Rotate another 90° clockwise, then move another 10,000 km forward along the surface of the Earth.

  Where are you now, and what direction are you facing?
They all used to get this wrong all the time. Now the best ones sometimes don't. (That said, only one to succed just as I write this comment was DeepSeek; the first I saw succeed was one of ChatGPT's models but that's now back to the usual error they all used to make).

Anecdotes are of course a bad way to study this kind of thing.

Unfortunately, so are the benchmarks, because the models have quickly saturated most of them, including traditional IQ tests (on the plus side, this has demonstrated that IQ tests are definitely a learnable skill, as LLMs loose 40-50 IQ points when going from public to private IQ tests) and stuff like the maths olympiad.

Right now, AFAICT the only open benchmarks are the METR time horizon metric, the ARC-AGI family of tests, and the "make me an SVG of ${…}" stuff inspired by Simon Willison's pelican on a bike.


Out of interest, was your intended answer "where you started, facing east"?

FWIW, Claude Opus 4.5 gets this right for me, assuming that is the intended answer. On request, it also gave me a Mathematica program which (after I fixed some trivial exceptions due to errors in units) informs me that using the ITRF00 datum the actual answer is 0.0177593 degrees north and 0.168379 west of where you started (about 11.7 miles away from the starting point) and your rotation is 89.98 degrees rather than 90.

(ChatGPT 5.1 Thinking, for me, get the wrong answer because it correctly gets near the South Pole and then follows a line of latitude 200 times round the South Pole for the second leg, which strikes me as a flatly incorrect interpretation of the words "move forward along the surface of the earth". Was that the "usual error they all used to make"?)


> Out of interest, was your intended answer "where you started, facing east"?

Or anything close to it so long as the logic is right, yes. I care about the reasoning failure, not the small difference between the exact quarter-circumferences of these great circles and 10,000km; (Not that it really matters, but now you've said the answer, this test becomes even less reliable than it already was).

> FWIW, Claude Opus 4.5 gets this right for me, assuming that is the intended answer.

Like I said, now the best ones sometimes don't [always get it wrong].

For me yesterday, Claude (albeit Sonnet 4.5, because my testing is cheap) avoided the south pole issue, but then got the third leg wrong and ended up at the north pole. A while back ChatGPT 5 (I looked the result up) got the answer right, yesterday GPT-5-thinking-mini (auto-selected by the system) got it wrong same way as you report on the south pole but then also got the equator wrong and ended up near the north pole.

"Never" to "unreliable success" is still an improvement.


Yeah, I'm pretty sure that's correct. Just whipped this up, using the WGS-84 datum.

  (use-modules (geo vincenty))
  
  (let walk ((p '(0 0 180))
             (i 0))
    (cond ((= i 3)
           (display p)
           (newline))
          (else
            (walk (apply vincenty
                         (list (car p) (cadr p) (+ 90 (caddr p)) 10000000))
                  (+ i 1)))))
Running this yields:

  (0.01777744062090717 0.16837322410251268 179.98234155229127)
Surely the discrepancy is down to spheroid vs sphere, yeah?


This fascinates me. Just observing but because it hasn't worked for you, everyone else must be lying? (I'm assuming that's what you mean by baseless)

How does that bridge get built? I can provide tangible real life examples but I've found push back from that in other online conversations.


My boss has been passing off Claude generated code and documentation to me all year. It is consistently garbage. It consistently hallucinates. I consistently have to rewrite most, if not all, of what I'm handed.

I do also try and use Claude Code for certain tasks. More often than not, i regret it, but I've started to zero in on tasks it's helpful with (configuration and debugging, not so much coding).

But it's very easy then for me to hear people saying that AI gives them so much useful code, and for me to assume that they are like my boss: not examining that code carefully, or not holding their output to particularly high standards, or aren't responsible for the maintenance and thus don't need to care. That doesn't mean they're lying, but it doesn't mean they're right.


Not everyone is your boss. I have 15 years of experience coding. So when the AI hallucinates, I call that out and it improves the code it does create. If someone is passing off Ai's first pass as done, they are not using the tool correctly.


My boss has 28 years of experience coding so that clearly isn't the deciding factor here.

Yes, i suppose it is theoretically possible that you are that much better than my boss and i at coaxing good output from an LLM, but I'm going to continue to be skeptical until i see it with my own eyes.


"Claude Code" by itself is not specific enough; which model are we talking about?


> it hasn't worked for you, everyone else must be lying?

Well, some non-zero amount of you are probably very financially invested in AI, so lying is not out of the question

Or you simply have blinders on because of your financial investments. After all, emotional investment often follows financial investment

Or, you're just not as good as you think you are. Maybe you're talking to people who are much better at building software than you are, and they find the stuff the AI builds does not impress them, while you are not as skilled so you are impressed by it.

There are lots of reasons someone might disagree without thinking everyone else is lying


I think calling it baseless to claim benefits from AI is more than disagreeing. It's claiming a rightness that is just contrarian and hyperbolic. It's really interesting to me that the skeptics are exactly who should be using AI. Push back on it. Tell it that the code it made was wrong.


What have you tried? How much time have you spent? Using AI is it’s own skill set separate from programming




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: