Weirdly enough I have the opposite experience where it will take several minutes...

0x_rs · 2025-07-17T20:58:38 1752785918

Some great advice I've found that seems to work very well: ask it to keep a succinct journal of all the issues and roadblocks found during the project development, and what was done to resolve or circumvent them. As for avoiding bloating the code base with scatterbrained changes, having a tidy architecture with good separation of concerns helps leading it into working solutions, but you need to actively guide it. For someone that enjoys problem-solving more than actually implementing them, it's very fun.

taude · 2025-07-17T21:34:17 1752788057

to continue on this, I wouldn't let claude or any agent actually create a project structure, i'd guide it in the custom system prompt. and then in each of the folders continue to have specific prompts for what you expect the assets to be coded like, and common behavior, libraries, etc....

gonzo41 · 2025-07-18T01:49:53 1752803393

So you've invented writing out a full business logic spec again.

btw, I'm not throwing shade. I personally think upfront design through a large lumbering document is actually a good way to develop stuff. As you either do it upfront, or through endless iterations in sprints for years.

bugglebeetle · 2025-07-18T04:02:48 1752811368

Yeah, my experience of working with Claude Code is that I’m actually far more conscientious about design. After using it for awhile, you get a good sense of its limits and how you need to break things down and spell thing out to overcome.

underdeserver · 2025-07-18T05:48:14 1752817694

The problem with waterfall wasn't the full business spec, it was that people wrote the spec once and didn't revise it when reality pushed back.

taude · 2025-07-21T17:52:50 1753120370

I spent 10 minutes writing out the business logic, you don't have to do it all at once. We're not talking about long complicated things here.

actinium226 · 2025-07-19T05:54:34 1752904474

> For someone that enjoys problem-solving more than actually implementing them, it's very fun.

So, is Claude just something you use for fun? Would you use it for work?

libraryofbabel · 2025-07-17T22:00:25 1752789625

Sigh. As others have commented, over and over again in the last 6 months we've seen discussions on HN with the same basic variation of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase."

I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)

* what exactly you were trying to do

* how much experience you have with the AI tool

* is your tool set up so it can get a feedback loop from changes, e.g. by running tests

* how much prompting did you give it; do you have CLAUDE.me files in your codebase

and so on.

As others pointed out, TFA also has the problem of not being specific about most of this.

We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!

imiric · 2025-07-17T22:27:28 1752791248

It's telling that you ask these details from a comment describing a negative experience, yet the top-most comment full of praises and hyperbole is accepted at face value. Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

I actually think it's more productive to just accept how people describe their experience, without demanding some extensive list of evidence to back it up. We don't do this for any other opinion, so why does it matter in this case?

> Let's try and move the discussion beyond that!

Sharing experiences using anecdotal evidence covers most of the discussion on forums. Maybe don't try to police it, and either engage with it, or move on.

serf · 2025-07-18T00:59:35 1752800375

>Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

Sort of.

The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

The people that seem to be unhappy with it tend to have the universal complaints of either "it produces garbage" , or "I'm slower with it.".

Maybe i'm showing my age here, but I remember these same exact discussions between people that either praised or disparaged search engines. The alternative being an internet Yellowpages (which was a thing for many years.)

The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

in other words : proof is kind of in the pudding.

I wouldn't care about the compiler logs from a user that ignored all syntax and grammar rules of a language after picking it up last week, either -- but it's useful for successful devs to share their experiences both good and bad.

I care more about the opinions of those that know the rules of the game -- let the actual teams behind these software deal with the user testing and feedback from people that don't want to learn conventions.

imiric · 2025-07-18T07:43:48 1752824628

> The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

Ah, but "whatever those might be" is the crucial bit.

I don't entirely disagree with what you're saying. There will always be a segment of power users who are able to leverage their knowledge about these tools to extract more value out of them than people who don't use them to their full potential. That is true for any tool, not just in software.

What you're ignoring are two other possibilities:

1. The expectation of users can be wildly different. Someone who has never programmed before, but can now create and ship a smartphone app, will see these tools as magical. Whatever issues they have will either go unnoticed, or won't matter considering the big picture. Surely their impression of AI tooling will be nothing short of positive. They might be experts at using LLMs, but not at programming.

OTOH, someone who has been programming for decades, and strives for a certain level of quality in their work, will find the experience much different. They will be able to see the flaws and limitations of these tools, and addressing them will take time and effort that they could've better spent elsewhere. As we've known since the introduction of LLMs, domain experts are the only ones who can experience these problems.

So the experience of both sides is valid, and should have equal weight in conversations. Unlike you, I do trust the opinion of domain experts over those of user experts, but that's a personal bias.

2. There are actual flaws and limitations in AI tooling. The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong. It steers the conversation away from issues with the tech that should be discussed and addressed. And considering the industry is strongly propelled by hype and marketing right now, we need conversations grounded in reality to push back against it.

Aeolun · 2025-07-18T11:12:17 1752837137

> The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong.

I’m not sure about that. I feel like someone experienced would realize when using the LLM is a better idea than doing it themselves, and when they just need to do it by hand.

You might work in a situation where you have to do everything by hand, but then your response would be to the extent that you can see how it’s useful to other people.

oblio · 2025-07-18T07:15:38 1752822938

> The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

One big warning here: search engines only became really useful when you could search for "who won the game" and the search engine actually returned the correct thing as the top result.

We're more than a quarter of a century later and probably 99.99% of users don't know about Google's advanced search operators.

This should be a major warning for LLMs. People are people and will do people things.

libraryofbabel · 2025-07-17T22:40:25 1752792025

I should have been clearer - I'd like to see this kind of information from positive comments as well. It's just as important. If someone is having success with Claude Code while vide-coding a toy app, I don't care. If they're having success with it on a large legacy codebase, I want them to write a blog post all about what they're doing, because that's extremely useful information.

imiric · 2025-07-18T07:14:28 1752822868

I jumped the gun a bit in my comment, since you did mention you want to see this from both sides. So it was clear, and I apologize.

The thing is that I often read this kind of response only to comments with negative experiences, while positive ones are accepted as fact. You can see this reinforced in the comments here as well. A comment section is not the right place to expand on these details, but I agree that blog posts should have them, regardless of the experience type.

gilfoy · 2025-07-17T22:55:57 1752792957

It’s telling that they didn’t specifically address it at the negative experience and you filled that in yourself

rounce · 2025-07-17T23:23:07 1752794587

It was the comment they replied to. If it was a general critique of the state of discourse around agentic tools and Claude Code in particular why not make it a top level comment?

libraryofbabel · 2025-07-18T00:36:49 1752799009

Oh, because I wanted to illustrate that the discourse is exemplified by the pair of the GP comment (vague and positive) and the parent comment (vague and negative). Therefore I replied to the negative parent comment.

leptons · 2025-07-18T03:08:00 1752808080

>But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

They did mention "(both positive and negative)", and I didn't take their comment to be one-sided towards the AI-negative comments only.

muzani · 2025-07-18T20:57:12 1752872232

They're tools. To a fluent tool user, the negative anecdotes sound like,

"I prefer typewriters over word processors because it's easier to correct mistakes."

"I don't own any forks because knives are just better at cutting bread."

"Bidets make my pants wet, so I'll keep to toilet paper."

I think there's an urge to fix misinformation. Whereas if someone loves Excel and thinks Excel is better than Java at making apps, I have no urge to correct that. Maybe they know something about Excel that I don't.

positron26 · 2025-07-18T02:06:01 1752804361

The framing has been rather problematic. I find these differences in premises are lurking below the conversations:

- Some believe LLMs will be a winner-take-all market and reinforce divergences in economic and political power.

- Some believe LLMs have no path of evolution and have therefore already plateaued and too low to be sustainable with these investments in compute, which would imply it's a flash in the pan that will collapse.

- Some believe LLMs will all be hosted forever, always living in remote services because the hardware requirements will always be massive.

- Some believe LLMs will create new, worse kinds of harm without enough offsetting creation of new kinds of defense.

- Some believe LLMs and AI will only ever give low-skilled people mid-skill results and therefore work against high-skill people by diluting mid-end value without creating new high-end value for them.

We need to be more aware of how we are framing this conversation because not everyone agrees on these big premises. It very strongly affects the views that depend on them. When we don't talk about these points and just judge and reply based on whether the conclusion reinforces our premises, the conversation becomes more political.

Confirmation bias is a thing. Individual interests are a thing. Some of the outcomes, like regulation and job disruption, depend on what we generally believe. People know this and so begin replying and voting according to their interests, to convince others to aid their cause without respect for the truth. This can be counter-productive to the individual if they are wrong about the premises and end up pushing an agenda that doesn't even actually benefit them.

We can't tell people not to advance their chosen horse at every turn of a conversation, but those of us who actually care about the truth of the conversation can take some time to consider the foundations of the argument and remind ourselves to explore that and bring it to the surface.

dejavucoder · 2025-07-17T22:13:27 1752790407

Fair point.

For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.

Prior to Claude Code, I have been using Cursor for an year or so.

Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)

I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.

I made new claude.md files.

zer00eyz · 2025-07-18T06:10:51 1752819051

I spend a fair amount of time tinkering in Home Assistant. My experience with that platform and LLM's can be summed up as "this is amazing".

I also do a fair amount of data shuffling with Golang. My LLM experience there is "mixed".

Then I deal with quite a few "fringe" code bases and problem spaces. There LLM's fall flat past the stuff that is boiler plate.

"I work in construction and use a hammer" could mean framer, roofer or smashing out concrete with a sledge. I suspect that "I am a developer, I write code" plays out in much the same way, and those details dictate experience.

Just based on the volume of ruby and typescript, and the overlap of the output of these platforms your experience is going to be pretty good. I would be curious if you went and did something less mainstream, and in a less common language (say Zig) if you would have the same feelings and feedback that you do now. Based on my own experience I suspect you would not.

oblio · 2025-07-18T07:18:53 1752823133

Speaking of that observation about "fringe": this will probably, increasingly, be a factor, let's call it LLMO (optimization), where "LLM friendly" content will be pushed. So I expect secondary or fringe programming languages to become even more pushed aside, since LLMs will not be as useful.

Which is, obviously, sad. Especially since the big winner is Javascript, a language that's still subpar as far as programming languages go.

state_less · 2025-07-17T22:19:08 1752790748

Here's a few general observations.

Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.

The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.

I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).

simonw · 2025-07-17T22:25:08 1752791108

Right, which is why having a comprehensive test suite is such an enormous unlock for this class of technology.

If your tests are good, Claude Code can run them and use them to check it hasn't broken any distant existing behavior.

dawnerd · 2025-07-17T22:49:41 1752792581

Not always the case. It’ll just go and “fix” the tests to pass instead of fixing the core issue.

simonw · 2025-07-17T23:49:01 1752796141

That used to happen a whole lot more. Recent Claudes (3.7, 4) are less likely to do that in my experience.

If they DO do that, it's on us to tell them to undo that and fix things properly.

theshrike79 · 2025-07-18T18:20:28 1752862828

This is why you keep CLAUDE.md updated, there it’ll write down what is where and other relevant info about the project.

Then it doesn’t need to feel (or rg) through the whole codebase.

You also use plan mode to figure out the issue, write the implementation plan in a .md file. Clear context, enter act mode and tell it to follow the plan.

dejavucoder · 2025-07-18T10:45:25 1752835525

Can probably give access to tools like ast-grep to Claude. Will help it see all references. I still agree some dynamic references might still be left. Only way is to prompt well enough. Since I tested this on a Ruby on Rails codebase, I dealt with this.

QuantumGood · 2025-07-18T00:29:50 1752798590

Agree. It keeps getting closer to "I've had a negative experience with the internet ..."

actinium226 · 2025-07-19T06:17:04 1752905824

I'm not convinced that "we know they work really well for some people." So far I just see people really excited about the potential and really impressed at what it's capable of, but I think people are extrapolating poorly. It's like, yes it's impressive that it can make a video game with a few prompts, but that doesn't mean that with a few more prompts it'll turn into a AAA game.

I'm on board with some limited AI autocompletion, but so far agents just seem like gimmicks to me.

fragmede · 2025-07-19T18:29:01 1752949741

If we handwave that the popular game Wordle, which made a lot of money for its author, could have been vibecoded, at what point does the gimmick become an actual feature that people look and pay for?

actinium226 · 2025-07-19T19:28:58 1752953338

No shade at wordle, but what you're describing sounds like it would be useful for the shovelware industry and that's about it. Not exactly a great leap forward for humanity...

Although I should be fair, this can help with one-off scripts that research folks usually do, when you just need to plot some data or do some back-of-the-terminal math. That said I don't think this would be a game changer, more of an efficiency boost and a limited one at that.

fragmede · 2025-07-19T19:40:24 1752954024

What would a great leap forwards for humanity look like? Sure, making it easier to shovel out shovelware means more shovelware, but why is that a bad thing? If customers have a very specific problem that wasn't going to get solved because it was too expensive to build a custom solution, and they now get to have bespoke software to cure their ills, other than being judgemental about this hypothetical piece of software as being shovelware, why is that a bad thing?

actinium226 · 2025-07-20T01:34:09 1752975249

Here's one version of what a great leap forward could look like, but it's simply one of many: an LLM that understand the CPU it's running on and can turn prompts into assembly, taking full advantage of the hardware. Or maybe it could target a virtual CPU like Java, but the point is that if the LLM can write code, why do it in Python or C? Just let it understand the CPU and let it rip. The only reason we have C/Python/etc. in the first place is because assembly sucks for humans to work with.

As to the shovelware, if it benefits people that's great, and I think the net benefit will likely be positive, but only slightly. The point in calling it shovelware is to suggest that it's low quality, and so it could have bugs and other performance issues that add costs to using which subtract from the benefit it provides (possibly in a net positive way, but probably not as fundamentally game changing as, say, Docker).

reactordev · 2025-07-17T22:07:46 1752790066

Seconded, that a summary description of your problem, codebase, programming dialect in use, should be included whenever a “<Model> didn’t work for me” response.

flir · 2025-07-18T11:47:58 1752839278

I find it telling that I have (mostly) good experiences with the GPT family and (mostly) bad experiences with the Claude family.

I just wish I could figure out what it tells. Their training data can't be that different. The problems I'm feeding them are the same. Many people think Claude is the more capable of the two.

It has to be how I'm presenting the problems, right? What other variable is there?

matwood · 2025-07-18T11:54:00 1752839640

If you have been using GPT for awhile it simply may know more about you.

rstuart4133 · 2025-07-17T22:56:50 1752793010

> But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least ...

I use Claude many times a day, I ask it and Gemini to generate code most days. Yet I fall into the "I've never included a line of code generated by an LLM in committed code" category. I haven't got a precise answer for why that is so. All I can come up with is the code generated lacks the depth of insight needed to write a succinct, fast, clear solution to the problem someone can easily understand in in 2 years time.

Perhaps the best illustration of this is someone proudly proclaimed to be they committed 25k lines in a week, with the help of AI. In my world, this sounds like they are claiming they have a way of turning the sea into ginger beer. Gaining the depth of knowledge required to change 25k lines of well written code would take me more than a week of reading. Writing that much in a week is a fantasy. So I asked them to show me the diff.

To my surprise, a quick scan of the diff revealed what the change did. It took me about 15 minutes to understand most of it. That's the good news.

The bad news it that 25k lines added 6 fields to a database. 2/3's were unit tests, perhaps 2/3's of the remainder was comments (maybe more). The comments were glorious in their length and precision, littered with ASCII art tables showing many rows in the table.

Comments in particular are a delicate art. They are rarely maintained, so they can bit rot in downright misleading babble after a few changes. But the insight they provide into what author was thinking, and in particular the invariants he had in mind can save hours of divining it from the code. Ideally they concisely explain only the obscure bits you can't easily see from the code itself. Anything more becomes technical debt.

Quoting Woodrow Wilson on the amount of time he spent preparing speeches:

    “That depends on the length of the speech,” answered the President. “If it is a ten-minute speech it takes me all of two weeks to prepare it; if it is a half-hour speech it takes me a week; if I can talk as long as I want to it requires no preparation at all. I am ready now.”

Which is a round about way of saying I suspect the usefulness of LLM generated code depends more on how often a human is likely to read it, than of any of the things you listed. If it is write once, and the requirement is it works for most people in the common cases, LLM generated code is probably the way to go.

I used PayPal's KYC web interface the other day. It looked beautiful, completely inline with the rest of PayPal's styling. But sadly I could not complete it because of bugs. The server refused to accept one page, it just returned to the same page with no error messages. No biggie, I phoned support (several times, because they also could not get past the same bug), and after 4 hours on the phone the job was done. I'm sure the bug will be fixed a new contractor. He spend an few hours on it, getting an LLM to write a new version, throwing the old code away, just as his predecessor did. He will say the LLM provided a huge productivity boost, and PayPal will be happy because he cost them so little. It will be the ideal application for an LLM - got the job done quickly, and no one will read the code again.

I later discovered there was a link on the page that allowed me to skip past the problematic page, so I could at least enter the rest of the information. It was in a thing that looked confusingly like a "menu bar" on the left, although there was no visual hit any of the items in the menu were clickable. I clicked on most of them anyway, but they did nothing. While on hold for phone support, I started reading the HTML and found one was a link. It was a bit embarrassing to admit to the help person I hadn't clicked that one. It sped the process up somewhat. As I said, the page did look very nice to the eye, probably partially because of the lack of clutter created by visual hints on what was clickable.

[0] https://quoteinvestigator.com/2012/04/28/shorter-letter/

0x457 · 2025-07-17T22:35:34 1752791734

There are some tasks that it can fail and not, but a lot of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase." IMO is "i know how to use it" vs "I don't know how to use it" with a side of "I have good test coverage" vs "tests?"

taude · 2025-07-17T21:02:06 1752786126

do you create the claude.md files at several levels of your folder structure, so you can teach it how to do different things? Configuring these default system prompts is required to get it to work well.

I'd definitely watch Boris's intro video below [1]

[1] Boris introduction: https://www.youtube.com/watch?v=6eBSHbLKuN0 [2] summary of above video: https://www.nibzard.com/claude-code/

dawnerd · 2025-07-17T22:47:37 1752792457

By the time you do all of that you might as well just write code by hand.

serf · 2025-07-18T00:49:38 1752799778

that's really just a scale question.

Yes, I would write a 4 line bash script by myself.

But if you're trading a 200 line comprehensive claude.md document for a module that might be 20k LoC? it's a different value proposition.

actinium226 · 2025-07-19T06:10:17 1752905417

And are you willing to stand behind those 20k loc? Like, whoever you're submitting it to, you can say "this is my work, it is done to a level of quality I find acceptable"?

darkwater · 2025-07-18T07:15:28 1752822928

And how do you actually know that those 20k line of codes have no glaring bugs, or bugs that you can find yourself, or be able to understand it completely at some point?

NeutralCrane · 2025-07-18T14:25:39 1752848739

How do you know your own handwritten 20k lines of code have no bugs, or that 20k lines of code written by coworkers have no bugs?

actinium226 · 2025-07-19T06:01:44 1752904904

I'm not the person you're replying to, but I have a lot more confidence in my own 20k lines of code than an AIs. I've built up skills to write performant, readable, functional, maintainable code. I build it up slowly and I can anticipate bugs as I write. I'm not perfect, but when bugs do arise, since I've built up the code, I have some idea of where to look and where not to look in order to fix them.

As for coworkers, I would really try to get them to work in chunks smaller than 20k loc. But at some point you have an expectation that coworkers will be accountable for their area of responsibility. If there's a bug in their code, they're expected to fix it. If there's a bug in the AIs code, I'm expected to fix it....

rovr138 · 2025-07-18T08:09:22 1752826162

The way I do this is by still writing tests.

darkwater · 2025-07-18T08:10:25 1752826225

Do tests let you understand a codebase you have not written?

intrasight · 2025-07-18T12:10:02 1752840602

this desire to understand code will be soon be seen as rather anachronistic. What's important is that you understand your tests. Let the AI generate the code.

The spec and the test are your human contribution.

darkwater · 2025-07-18T12:47:33 1752842853

I understand your point of view but I think it's too "optimistic", i.e. it will not happen soon, at least not outside AI maximalists.

jimbokun · 2025-07-18T18:11:10 1752862270

If the tests are written with sufficient detail that you don't need to look at the code, the implementation of the code is such a small part of the overall work that you are gaining very little in terms of overall productivity.

intrasight · 2025-07-30T23:26:26 1753917986

I agree

chasd00 · 2025-07-18T13:03:15 1752843795

you're describing TDD and it never turned into the panacea that was promised. I'm excited to try claude code, i even have a decent little personal project lined up for it but someone somewhere will always need to understand the code because tests are never 100% exhaustive and major flaws come up.

two_tasty · 2025-07-18T13:41:55 1752846115

Ah yes, can't wait to tell my auditor / regulator "I don't understand the code because Claude wrote it, but it's fine, because understand the code is for boomers." That will get a big laugh in a deposition.

intrasight · 2025-07-18T14:55:59 1752850559

That'll be anachronistic too obviously. Your tests will be audited.

jimbokun · 2025-07-18T18:09:01 1752862141

I would say yes.

To have useful tests, you must write the APIs for the functions, and give examples of how to wire up the various constructs, and correct input/output pairs.

Implementations of those functions that pass the test now have significant constraints that mean you understand a lot about it.

theshrike79 · 2025-07-18T18:13:37 1752862417

That’s called Test Driven Development.

First you write the tests, then you write code until tests pass.

koolba · 2025-07-18T09:51:25 1752832285

They can. Particularly if you use them to validate your assumptions about the code.

makeramen · 2025-07-18T04:06:58 1752811618

You don't do it manually. You have claude do it once you’ve guided it back on track to remind itself not to do it next time.

ghuntley · 2025-07-18T02:59:16 1752807556

I think you are perhaps missing the point. Investing into these techniques [2] enables you to do unhinged things. Such as building a compiler whilst you are AFK [1].

[1] https://x.com/i/broadcasts/1OyJALVOnEzGb

[2] https://ghuntley.com/ralph

moomoo11 · 2025-07-18T04:45:44 1752813944

Sure if I want to just toy around for fun.

Those are cool, but a production system is infinitely more complex.

ghuntley · 2025-07-18T12:26:40 1752841600

What do you define as a production system? Are you aware that one can generate TLA+ specifications, then code generate from these specifications and assert that the implementation matches the TLA+ spec?

jm4 · 2025-07-17T20:15:40 1752783340

You can tell Claude to verify its work. I’m using it for data analysis tasks and I always have it check the raw data for accuracy. It was a whole different ballgame when I started doing that.

Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.

vunderba · 2025-07-17T20:25:19 1752783919

> You can tell Claude to verify its work

Definitely - with ONE pretty big callout. This only works when a clear and quantifiable rubric for verification can be expressed. Case in point, I put Claude Code to work on a simple react website that needed a "Refresh button" and walked away. When I came back, the button was there, and it had used a combination of MCP playwright + screenshots to roughly verify it was working.

The problem was that it decided to "draw" a circular arrow refresh icon and the arrow at the end of the semicircle was facing towards the circle centroid. Anyone (even a layman) would take one look at it and realize it looked ridiculous, but Claude couldn't tell even when I took the time to manually paste a screenshot asking if it saw any issues.

While it would also be unreasonable to expect a junior engineer to hand-write the coordinates for a refresh icon in SVG - they would never even attempt to do that in the first place realizing it would be far simpler to find one from Lucide, Font Awesome, emojis, etc.

DrewADesign · 2025-07-18T04:26:22 1752812782

In general, using your own symbol forms for interactions rather than taking advantage of people’s existing mental models is a bad idea. Even straying from known libraries is shaky unless you’re a competent enough designer to understand what specific parts of a visual symbol signify that specific idea/action, and to whom. From a usability perspective, you’re much better off not using a symbol at all than using the wrong one.

yakz · 2025-07-17T20:39:14 1752784754

I second this and would add that you really need an automated way to do it. For coding, automated test suites go a long way toward catching boneheaded edits. It will understand the error messages from the failed tests and fix the mistakes more or less by itself.

But for other tasks like generating reports, I ask it to write little tools to reformat data with a schema definition, perform calculations, or do other things that are fairly easy to then double-check with tests that produce errors that it can work with. Having it "do math in its head" is just begging for disaster. But, it can easily write a tool to do it correctly.

bigiain · 2025-07-17T23:14:23 1752794063

> Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.

That's exactly what I learned. In the early 2000's, from three expensive failed development outsourcing projects.

baka367 · 2025-07-18T03:29:01 1752809341

For me it fixed a library compatibility issue with React 19 in 10 mins and several nudges startign from the console error and library name.

It would have been a half-day worth of adventure at least should i have done it myself (from diagnosing to fixing)

tcdent · 2025-07-17T20:18:48 1752783528

This has a lot to do with how you structure your codebase; if you have repeatable patterns that make conventions obvious, it will follow them for the most part.

When it drops in something hacky, I use that to verify the functionality is correct and then prompt a refactor to make it follow better conventions.

wyldfire · 2025-07-17T22:10:31 1752790231

I have seen both success and failure. It's definitely cool and I like to think of it as another perspective for when I get stuck or confused.

When it creates a bunch of useless junk I feel free to discard it and either try again with clearer guidelines (or switch to Opus).

nzach · 2025-07-18T18:14:24 1752862464

> take several minutes to do something

The quality of the generated code is inversely proportional to the time it take to generate it. If you let Claude Code work alone for more than 300 seconds you will receive garbage code. Take that as a hint, if it can't finish the task in this time it means you are asking too much. Break up your feature and try with a smaller feature.

Philpax · 2025-07-18T10:34:41 1752834881

> I go in and debug for a while because the app has become fubar, then finally realize it did the whole thing incorrectly and throw it all away.

This seems consistent with some of the more precocious junior engineers I've worked with (and have been, in the past.)

leptons · 2025-07-18T03:04:00 1752807840

Have you tried vibing harder?

hnaccount_rng · 2025-07-17T20:21:46 1752783706

Yeah that is kind of my experience as well. And - according to the friend who highly recommended it - I gave it a task that is "easily within its capabilities". Since I don't think I'm being gaslighted, I suspect it's me using it wrong. But I really can't figure out why. And I'm on my third attempt now..