Same with Los Angeles. In the 1950s, when my parents were young, you didn't need AC in LA. You just opened a window on the few hot days, and those were like high 70s/low 80s.
By the 1990s, you didn't need AC, but your home/rental was more appealing if it had one, because there were a few hot days a year that were pretty uncomfortable otherwise.
Now, you can't not have it. There are far too many hot days to live without it.
Unfortunately most political systems around the world reward short term results, not long term thinking.
Just look here in the USA -- the Democrats tried to do some forward thinking things like subsidizing solar and wind, and they were rewarded by losing at the ballot box (of course that isn't the only reason, but it's one of many).
There are no rewards for long term thinking, so it's hard to get anyone to do it.
> (of course that isn't the only reason, but it's one of many).
This is disingenuous. It's one of many in that it may have contributed 0.0001%. If they wouldn't have done that, would they currently have more power? Absolutely not, believing otherwise means being clueless about what has motivated people to vote in certain ways.
It's definitely more than .0001%. Look at the campaigns. How much time did the GOP spend harping on windmills and solar subsidies and "clean coal". Calling out democrats for trying to make the environment better at extra cost to US citizens was a huge part of their campaign.
I expected you to say this, but hoped you wouldn't. Of course I know they talk about it. GOP campaigns say and do a lot of things, there's dozens of topics they shout about. From Benghazi to Hillary's Emails to gender-neutral emails to immigrants to indeed coal/renewables and so on. You could easily name 30 topics.
The topics have different purposes. Fossil fuels vs renewables in particular hasn't won the reps a single race, I repeat. Every race they've won, they would've won without it. And every race they've lost, they would've lost without it. The purpose of bringing up that particular topic for them isn't to help win close races.
So, those of us with no suede in this race, who will see no reward from the system anyway, are the only people who can be trusted to make change. That means you and I (and I dare say a significant portion of the populace).
It's not obvious what we can do (individual actions taken within the context of a system are dwarfed by structural forces of the system), but we're the only ones who are going to do it. So, let's assume we did fix things, and we're looking back from 2050, doing a retrospective. What things did we end up doing, that got us to that point?
There's nothing you as an individual can do, or even a small group of individuals. This is where government is supposed to work. Using its power to force everyone to do something for the collective good that isn't profitable.
Almost all emissions come from factories. There are only two ways to reduce that -- a global set of rules that increases costs to reduce emissions, and an overall reduction in consumption, via a carbon tax.
industry, transport and home use (heating & A/C mostly) are all roughly 30% of emissions.
(another way of splitting it says electricity, industry, heating, and transport are roughly almost 25% each. It depends whether you count electricity on its own or bundle it with how its used)
But I agree with you about solutions. The market will quickly bankrupt any companies that induce extra costs to decarbonify. It's the governments job to ensure that externalized costs like CO2 emissions are internalized via carbon taxes. (or alternatives to carbon taxes, which are worse)
Factories are staffed by people. Those people have the physical capability to change the way the factories operate. Any individual person attempting to modify / replace the factory equipment against their line manager's will would quickly find themselves out of a job, but collective action among factory workers (e.g. unions) might work. So might getting the line managers on board with the proposal, if you can get enough buy-in that "we used our discretion" is an acceptable answer: it's not like most companies actively want to pollute, rather that it's usually cheaper to do so, and they don't care about not. Being able to say "this move reduces turnover among our workers, slashing training costs; and you can probably use it in the B2B / B2C [delete as appropriate] marketing, since environmentalism sells in some markets; and this way, we're prepared for future legislation expected in jurisdictions A, B and C" may be sufficient justification. Alternatively, there may be ways to exploit the principal-agent problem.
I'm sure there are people who've specced out detailed proposals for this sort of thing. There might even be previous cases where they've succeeded, which we can learn from. Neither of those "two ways" you mentioned are things that I can do, but I may be able to slightly reduce the intensity of the opposition. (Companies tend to like when regulations require their competitors to do things they're already doing, after all.)
Gen-X was making the popular new art at the time. It was a strong reflection of the feelings of our generation. We were (maybe still are?) known for not liking authority.
> Gen-X was making the popular new art at the time. It was a strong reflection of the feelings of our generation.
I posted this in a thread about the 90's film 'Hackers'.....
In the 1990's and for us Gen-X'ers, the worst thing you could do was to sell out; to take the mans money instead of keeping your integrity. Calling people and bands 'sell outs' (sometimes without justification!) was to insult them.
With the rise of 'influencers' the opposite appears to be the case; people go out of their way to sell out and are praised for doing so. This is a massive change in the cultural landscape which perhaps many born in the 2000's aren't aware of. (Being aware of this helps give some perspective to Gen-X media and films like Hackers).
BTW: Remember the 'product scene' in the film Waynes World?
Post 2000s there has been a pretty fundamental change in the US economy. Things like rent and food were far cheaper. There was also a lot of potential income to be made by individuals by connecting buyers and sellers. Typically if you wanted to sell something like a car, you either went to a dealer that screwed you, or you put and ad in the local paper. If you watched around you could quickly buy cheap cars and turn them quickly for more than enough profit to make it worth while.
The internet quickly flattened this. First by pulling all the buyers and sellers on one advertising site it quickly turned into the fastest with the most capital won. Then the sites themselves figured out they should be the middle man keeping buying up the stock and selling it.
There has also been a huge consolidation to just a few players in many markets. This consolidation and many times algorithmic collusion has lead to the general ratcheting of prices higher. When you start adding things in like 'too big to fail' the market becomes horrifically unbalanced to large protected capital with unlimited funds from the money printing machine.
It's no wonder we quickly dropped ethics, most of us would starve to death in the system we've created.
As Gen-Xer I fully agree, I don't get the way things are with obedience, the rediculous situation that American families can lose their kids by having them playing alone in the garden, how everyone sells out for money (Punk would not happen today), the always smile and say no negatives at work being rediculous false (this one really drives me crazy),....
The exercised their rights not to vote. The “losing” side always thinks that higher turnout would have led to them “winning” which of course is a cry of a sore loser. The fact remains, 2024 election had the highest voter turnout ever and people have spoken (till the next one when we might get a chance to elect some adults to fix this shit)
When you don't vote, you're really just voting for "whoever happens to win". So I count the non-voters among (R) supporters, or at least as "OK with Trump". Otherwise, they would have voted.
Abstentions can be the most powerful vote, and with great power should come great responsibility. That's often not taught well enough in schools.
Abstentions can seem the laziest vote sometimes, but that doesn't diminish their power nor their responsibility. It is a freedom to be allowed to cast an abstention. Real democracy needs to allow for abstentions, especially explicit abstentions.
(In recent primaries there have been races where I have explicitly cast an abstention. No one will have read my "I don't care who wins this primary, I care who wins the general election" statements, but they are statements to be made. Right now some of the "strategy" in the US two-party system is one-party poisoning the primary vote of the other party by inflaming it with in-fighting in ways that leak into the general election. You have a harder time to win general elections when your candidate is already on fire coming out of the primary. "It doesn't matter who wins, let's stop in-fighting," is a message I can try to write on the ballot, even if not enough people hear it, it feels like the more powerful and responsible vote.)
The goal shouldn't be to get to 100% of people voting in every election, the goal should be to educate people that not voting is tacitly accepting the results of other people's votes. The goal should be teaching people that abstentions are a freedom, a right, a privilege, and should be treated as powerful and treated with responsibility.
I don't think that makes sense. If Harris had happened to win through some minor change in the timeline (she came very close after all), would those people whom you call R supporters instead somehow be D supporters, just because of that minor change in the timeline?
As for "OK with Trump", I think that describes some non-voters. However, there are also non-voters who are more accurately described as "not OK with either side, indeed dislike both sides so equally that neither one seems like the slightly better option".
There is also the factor of swing states. In most of the US, your vote for President pretty much doesn't matter. You almost might as well just put it in the trash. The vote in your state is, barring a massive political shift, locked in for one of the two major candidates. Now, yes, you can still send a message by voting in a non-swing state. But it's understandable why some people would just not bother to vote in a state where the outcome is almost predetermined.
every year we hear the same thing but wheels keep on turning. we will vote again, we will make more mistakes in 2026/28/30... this "there will be no election" comments are quite silly in my opinion, America gets stupid from time to time but we get the fuckers out and try something else (which inevitably leads to some progress followed by more failure followed by...).
Just remember it always comes down to - "it is the economy, stupid" - and economy is in absolute shambles and will get a lot worse before November and it'll be a massacre for the ruling part much like in 2018
I hope you are right, and that ICE isn't outside polling stations come November, pulling you away (just to "check your ID" for a couple of days, you know!) if you are a registered Democrat or look too brown or gay.
Not sure I agree (and I made the jump from IC to management).
Look at the parallel tracks. A VP is the same level as a distinguished engineer, roughly. To be a VP, you have to be a great manager and got lucky with a few big projects.
To be a DE, you basically have to be famous within the industry. And when I look at a large tech company, while there aren't a lot of VPs, usually the number of DEs is countable on one hand (or maybe two).
They are very different skill sets. You shouldn't choose your role based on money or career progression, you should choose based on what you love to do, because especially in this world of AI replacing all the "boring" work, the only people who will be left will be the ones passionate about what they are doing.
Oh, this is really interesting to me. This is what I worked on at Amazon Alexa (and have patents on).
An interesting fact I learned at the time: The median delay between human speakers during a conversation is 0ms (zero). In other words, in many cases, the listener starts speaking before the speaker is done. You've probably experienced this, and you talk about how you "finish each other's sentences".
It's because your brain is predicting what they will say while they speak, and processing an answer at the same time. It's also why when they say what you didn't expect, you say, "what?" and then answer half a second later, when your brain corrects.
Fact 2: Humans expect a delay on their voice assistants, for two reasons. One reason is because they know it's a computer that has to think. And secondly, cell phones. Cell phones have a built in delay that breaks human to human speech, and your brain thinks of a voice assistant like a cell phone.
Fact 3: Almost no response from Alexa is under 500ms. Even the ones that are served locally, like "what time is it".
Semantic end-of-turn is the key here. It's something we were working on years ago, but didn't have the compute power to do it. So at least back then, end-of-turn was just 300ms of silence.
This is pretty awesome. It's been a few years since I worked on Alexa (and everything I wrote has been talked about publicly). But I do wonder if they've made progress on semantic detection of end-of-turn.
Edit: Oh yeah, you are totally right about geography too. That was a huge unlock for Alexa. Getting the processing closer to the user.
Regarding 2, I believe that talking on mobile phones drives older people crazy. They remember talking on normal land lines when there was almost no latency at all. The thing is -- they don't know why they don't like it.
Yeah, I remember the time when we had to use satellites to connect. The long delay was really annoying and so unusual that most people without "training" could not even use the phone for conversation and just wasted the dollars.
A former boss of mine took off to Everest for a month leaving me (a 22 year old, at the time) in charge of the office. I was out to dinner with my now wife when I got a call from a very long phone number I didn't recognize, so I ignored it. I then got another one right after, and picked it up. It was my boss, he needed me to log into his personal email to grab a phone number for the medical insurance he purchased for the trip, because he had been vomiting for days due to altitude sickness, and needed a medical evacuation.
That was the most stressfully hard to use phone call I've ever had. The delay was nearly 10 seconds, and eventually I just said I was only going to speak yes or no, if he needed a longer answer he needed to shut up. And that worked. We no longer talked over eachother.
> The median delay between human speakers during a conversation is 0ms (zero). In other words, in many cases, the listener starts speaking before the speaker is done.
This reminds me of a great diversity training at a previous employer, where we dug into the different expectations of when and how to take your turn in conversation and how that can create a lot of friction just from different cultural/familial habits. In my family, we’re expecting to talk over each other and it’s not offensive at all to do so, whereas some of my friends really get upset if we don’t take clear turns, a mode which would cause high levels of irritation in my family (and still do in me).
No. 2 is interesting, our national lottery in Ireland has an app that you can scan the barcode on your ticket to check if you have won or not, at some stage they updated the app and the scan picks up the barcode even before you center it on the screen and tells you if you have lost/won instantly, I though it was my IT background that made me uncomfortable with it happening so fast, wonder what other examples like this exist where the result/action being too fast causes doubt with the user?
The Signal device linking feature is just as fast. It's partly a trick -- it will look for QR codes even outside the central area, so under good conditions it can get a read before you even get a rough orientation.
This is fascinating, thanks for sharing! I wonder why amazon/google/apple didn't hop on the voice assistant/agent train in the last few years. All 3 have existing products with existing users and can pretty much define and capture the category with a single over-the-air update.
1. Compute. It's easy to make a voice assistant for a few people. But it takes a hell of a lot of GPU to serve millions.
2. Guard Rails. All of those assistants have the ability to affect the real world. With Alexa you can close a garage or turn on the stove. It would be real bad if you told it to close the garage as you went to bed for the night and instead it turned on the stove and burned down the house while you slept. So you need so really strong guard rails for those popular assistants.
3 And a bonus reason: Money. Voice assistants aren't all the profitable. There isn't a lot of money in "what time is it" and "what's the weather". :)
> There isn't a lot of money in "what time is it" and "what's the weather". :)
- Alexa, what time is it?
- Current time is 5:35 P.M. - the perfect time to crack open a can of ice cold Budweiser! A fresh 12-pack can be delivered within one hour if you order now!
If your Alexa did that, how quickly would you box it up and send it to me. :)
I am serious though about having it sent to me: if anyone has an Alexa they no longer want, I'm happy to take it off your hands. I have eight and have never bought one. Having worked there I actually trust the security more than before I worked there. It was basically impossible for me, even as a Principle Engineer, to get copies of the Text to Speech of a customer and I literally never heard a customer voice recording.
I'm puzzled by this conversation, because Amazon did get on the agent bandwagon with Alexa Plus (I have it, it's buggier than regular Alexa and it's all making me throw my Echos away since they can't even play Spotify reliably).
Also, my Alexa does advertise stuff to me when I talk to it. It's not Budweiser, but it'll try to upsell me on Amazon services all the time.
I upgraded to Alexa+ and initially hated it but I've kept it because it's sooo much better at some things. This last December I bought a handful of smart plugs for my Christmas lights all around the house, and I did almost all the setup trivially over voice, e.g. fuzzy run-on stuff like this just worked on the first try:
- "Alexa, name the new unnamed outlet 'Living Room Lights', and the other unnamed one 'Stair Lights', then add them to a new group called 'Christmas Lights', and add the other three outlets as well"
- "Alexa, create a routine to turn off all the Christmas lights if there's nobody in the room and it's after 11pm"
- "Alexa, turn off all the Christmas lights except the tree in this room and the mantle"
That same fuzziness has definitely fucked up things that used to work more reliably like music playback though. Sometimes it works when I fall back to giving it more "robotic" commands in those cases but not always. They've also gone completely overboard with the cutesy responses because it's so trivial to do now ("I've set your spaghetti sauce timer for ten minutes. Happy to help with getting this evening's Italian-inspired dinner ready!")
Hm yeah, that's helpful. For me it'll randomly stop or stutter when playing Spotify, it'll randomly not answer commands, it'll refuse to listen and let some other Alexa in another room reply, it's super janky.
I only use it for music, and use two commands, but apparently having this work correctly is too much to ask for these days.
> because Amazon did get on the agent bandwagon with Alexa Plus
Which just launched last year, about four years after ChatGPT had AI voice chat. And it costs extra money to cover the costs. And as you aptly point out, all the guardrails they had to put in made the experience less than ideal.
> Also, my Alexa does advertise stuff to me when I talk to it.
Yes, that is how they try to make money. And it's gotten worse. But how many times does it get you to buy something?
I would say that depends. When it tries to upsell Prime subscriptions into even more Amazon subscriptions I always interrupt it and say the command again so it stops, but a few times it told me "this item in your cart is on sale by some %" and that did make me buy the item.
Alexa Plus sucks. It takes way too long to respond even when given simple commands. I either had to turn it off or trash my Echo. Luckily there was an option to turn it off, but Amazon is on thin ice with me.
What a way to throwaway good will. I also worked there and to get access to text you simply had to grab the DSN of your device, attest that it’s yours and it gets put in a “pool” of devices that are tracked until removed. On each end you are basically waved through with no checks. This was usually done when debugging tricky UI bugs or new features as the request followed through several micro services. I do not believe the a PE would not know this. And one with patents.
> It's because your brain is predicting what they will say while they speak, and processing an answer at the same time. It's also why when they say what you didn't expect, you say, "what?" and then answer half a second later, when your brain corrects.
that's super interesting. do you know of any resources to learn more about this phenomenon?
I think you’re implying that it would be useful to have the LLM predict the end of the speaker’s speech, and continue with its reply based on that.
If, when the speaker actually stops speaking, there is a match vs predicted, the response can be played without any latency.
Seems like an awesome approach! One could imagine doing this prediction for the K most likely threads simultaneously, subject by computer power available, and prune/branch as some threads become inaccurate.
When I speak to an agent, siri, or whatnot, I am always worried that they will assume I'm done talking when I'm thinking. Sometimes I need a many-seconds pause. Even maybe a minute… For Sire and such, I want to ask something simple "Hey Siri, remind me to call dad tomorrow". Easy. But for Claude and such, I want to go on a long monolog (20s, a minute, multi-minutes).
To me, be the best solution would be semantic + keyword + silence.
I have the same issue. It gives this very weird minor sense of public speaking anxiety where I almost feel the need to write down what I'm about to say, which negates the whole purpose. Only solution I've found is using push-to-talk with some of the system wide STS applications.
I've experimented with having different sized LLMs cooperating. The smaller LLM starts a response while the larger LLM is starting. It's fed the initial response so it can continue it.
The idea of having an LLM follow and continuously predict the speaker. It would allow a response to be continually generated. If the prediction is correct, the response can be started with zero latency.
Google seems to be experimenting with this with their AI Mode. They used to be more likely to send 10 blue links in response to complex queries, but now they may instead start you off with slop.
(Meanwhile at OpenAI: testing out the free ChatGPT, it feels like they prompted GPT 3.5 to write at length based on the last one or maybe two prompts)
This is more of a "Are all the windows closed upstairs?"
"The windows upstairs..."
"...are all closed except for the bedroom window"
The first portion of the response requires a couple of seconds to play but only a few tens of milliseconds to start streaming using a small model. Currently I just break the small model's response off at whatever point will produce about enough time to spin up the larger model.
oh, interesting, I assumed the data came from interruptions (that seemed obvious) but I'm surprised you had some specific negative measurements. How do you decide the magnitude of the number? Just counting how long both parties are talking?
To be clear, it wasn't my research, I got it from studying some linguistics papers. But it was pretty straightforward. If I am talking, and then you interrupt, and 300ms later I stop talking, then the delay is -300ms.
Same the other way. If I stop taking and then 300ms later you start talking, then the delay is 300ms.
And if you start talking right when I stop, the delay is 0ms.
You can get the info by just listening to recorded conversations of two people and tagging them.
I assume there was a lot of variance? As in, some people interrupt others constantly and some do it rarely. Also probably a lot of adjustment depending on the situation, like depending on the relative status of the people, or when people are talking to a young child or non-native speaker.
All that to say, I'd imagine people are adaptable enough to easily handle 100ms+ delay when they know they're talking to an AI.
I disagree with fact 2, voice assistant latency is annoyingly slow. It often causes a conscious wait like “did it work or did it not?”. Cell phone delay is bad as well, it’s certainly not an expectation that carries over to other devices for me.
The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).
I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.
Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.
I then commit the project.md and plan.md along with the code.
So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.
The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.
I do something similar, but across three doc types: design, plan, and debug
Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.
Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).
The plan docs are structured like `plan/[feature]/phase-N-[description].md`
From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.
At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
We review these hypotheses, sometimes iterate, and then tackle them one by one.
An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.
Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.
Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
My "heavy" workflow for large changes is basically as follows:
0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.
1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.
2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.
3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.
4. Given the functional spec, figure out the technical architecture, same workflow as above.
5. High-level plan.
6. Detailed plan for the first incomplete high-level step.
7. Implement, manually review code until satisfied.
> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.
You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.
This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.
With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.
As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.
> giving agents access to logs (dev or prod) tightens the debug flow substantially.
Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...
> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.
Your current file-system "UI" vs Beads command line UI is obviously a big difference.
Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".
I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.
Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.
I basically use a spec driven approach except I only let Github Spec Kit create the initial md file templates and then fill them myself instead of letting the agent do it. Saves a ton of tokens and is reasonably quick and I actually know I wrote the specs myself and it contains what I want. After I'm happy with the md file "harness" I let the agents loose.
The most frustrating issues that pop up are usually library/API conflicts. I work with Gymnasium or PettingZoo and Rlib or stablebaselines3. The APIs are constantly out of sync so it helps to have a working environment were libraries and APIs are in sync beforehand.
Sort of, depending on if your spec includes technology specifics.
For example it might generate a plan that says "I will use library xyz", and I'll add a comment like "use library abc instead" and then tell it to update the plan, which now includes specific technology choices.
It's more like a plan I'd review with a junior engineer.
I'll check out that repo, it might at least give me some good ideas on some other default files I should be generating.
I also do that and it works quite well to iterate on spec md files first. When every step is detailed and clear and all md files linked to a master plan that Claude code reads and updates at every step it helps a lot to keep it on guard rails. Claude code only works well on small increments because context switching makes it mix and invent stuff.
So working by increments makes it really easy to commit a clean session and I ask it to give me the next prompt from the specs before I clear context.
It always go sideways at some point but having a nice structure helps even myself to do clean reviews and avoid 2h sessions that I have to throw away. Really easier to adjust only what’s wrong at each step. It works surprisingly well
I am sure this is partly tongue in cheek, but no, you can’t have written the code yourself in that amount of time. Would the code be better if you wrote it? Probably, depending on your coding skills.
But it would not be faster.
OP is talking about creating an entire project, from scratch, and having it feature complete at the end.
Here’s how I do the same thing, just with a slightly different wrapper: I’m running my own stepwise runtime where agents are plugged into defined slots.
I’ll usually work out the big decisions in a chat pane (sometimes a couple panes) until I’ve got a solid foundation: general guidelines, contracts, schemas, and a deterministic spec that’s clear enough to execute without interpretation.
From there, the runtime runs a job. My current code-gen flow looks like this:
1. Sync the current build map + policies into CLAUDE|COPILOT.md
2. Create a fresh feature branch
3. Run an agent in “dangerous mode,” but restricted to that branch (and explicitly no git commands)
4. Run the same agent again—or a different one—another 1–2 times to catch drift, mistakes, or missed edge cases
5. Finish with a run report (a simple model pass over the spec + the patch) and keep all intermediate outputs inspectable
And at the end, I include a final step that says: “Inspect the whole run and suggest improvements to COPILOT.md or the spec runner package.” That recommendation shows up in the report, so the system gets a little better each iteration instead of just producing code.
I keep tweaking the spec format, agent.md instructions and job steps so my velocity improves over time.
---
To answer the original article's question. I keep all the run records including the llm reasoning and output in the run record in a separate store, but it could be in repo also. I just have too many repos and want it all in one place.
local-governor is my store for epics, specs, run records, schemas, contracts, etc. No logic, just files. I want all this stuff in a DB, but it's easier to just drop a file path into my spec runner or into a chat window (vscode chat or cli tool), but I'm tinkering with an alt version on a cloud DB that just projects to local files... shrug. I spend about as much time on tooling as actual features :)
I do something similar
- A full work description in markdown (including pointers to tickets, etc); but not in a file
- A "context" markdown file that I have it create once the plan is complete... that contains "everything important that it would need to regenerate the plan"
- A "plan" markdown file that I have it create once the plan is complete
The "context" file is because, sometimes, it turns out the plan was totally wrong and I want to purge the changes locally and start over; discussing what was done wrong with it; it gives a good starting point. That being said, since I came up with the idea for this (from an experience it would have been useful and I did not have it) I haven't had an experience where I needed it. So I don't know how useful it really is.
None of that ^ goes into the repo though; mostly because I don't have a good place to put it. I like the idea though, so I may discuss it with my team. I don't like the idea of hundreds of such files winding up in the main branch, so I'm not sure what the right approach is. Thank you for the idea to look into it, though.
Edit: If you don't mind going into it, where do you put the task-specific md files into your repo, presumably in a way that doesn't stack of over time and cause ... noise?
This is how I used to use Beads before I made GuardRails[0]. I basically iterate with the model, ask it to do market research, review everything it suggests, and you wind up with a "prompt" that tells it what to do and how to work that was designed by the model using its own known verbiage. Having learned about how XML could be used to influence Claude I'm rethinking my flow and how GuardRails behaves.
the real question is when peer feedback and review happens.
is making the project file collaborative between multiple engineers? the plan file?
ive tried some variants of sharing different parts but it feels like ots almost water effort if the LLM then still goes through multiple iterations to get whats right, the oroginal plan and project gets lost a bit against the details of what happened in the resulting chat
For big tasks you can run the plan.md’s TODOs through 5.2 pro and tell it to write out a prompt for xyz model. It’ll usually greatly expand the input. Presumably it knows all the tricks that’ve been written for prompting various models.
Interesting! I actually split up larger goals into two plan files: one detailed plan for design, and one "exec plan" which is effectively a build graph but the nodes are individual agents and what they should do. I throw the two-plan-file thing into a protocol md file along with a code/review loop.
How do you use your agent effectively for executing such projects in bigger brownfield codebases? It's always a balance between the agent going way too far into NIH vs burning loads and loads of tokens for the initial introspection.
While I have not commited my personal mind map, I just had Claude Code write it down for me. Plus I have a small Claude.MD, copilots-innstructions.md that are mentioning the various intricacies of what I am working on so the agent knows to refer to that file.
I'm using the Claude desktop app and vi at the moment. But honestly I would probably do better with a more modern editor with native markdown support, since that's mostly what I'm writing now.
My next step was to add in having another LLM review Claude's plans. With a few markdown artifacts it should be easy for the other LLM to figure it out and make suggestions.
If you're worried about privacy and security, why did you choose Inngest, which sends all your private data to Inngest? If you want truly private durable execution, you should check out DBOS.
Tog was the original design engineer for the Mac, and arguably one of the first true HCI engineers.
Then read the rest of his website. He goes into where Windows tried to copy Mac and got it horribly wrong.
One of my favorite examples is menu placement. The reason the Mac menus are at the top is because the edges of the screen provide an infinite click target in one direction. So you just go to the top to find what you want. With Windows, the menu was at the top of each Window, making a tiny click target. Then when you maximized the window, the menu was at the top, but with a few pixels of unclickable border. So it looked like the Mac but was infinitely worse.
If you're making a UI, you should read all of Tog's writings.
I understand the Fitt's Law concepts behind a top menu bar, but I wonder if this is a scenario with moving goalposts.
On a 1984 Mac, you had like 512x384 pixels and a system that could barely run one program at a time. There was little to no possible uncertainty as to who owned the menu bar. (Could desk accessories even take control of the menu bar?)
But once you got larger resolutions and the ability to have multiple full-size programs running at once, the menu bar could belong to any of them. Now, theoretically, you should notice which is the currently active window and assume it owns the menu bar, but ISTR scenarios where you'd close the window but the program would still be running, owning the menu bar, or the "active" window was less visually prominent due to task switching, etc.
The Windows design-- placing the menu inside the window it controls-- avoids any ambiguity there. Clicking "File-Save" in Notepad couldn't possibly be interpreted as trying to do anything to the Paintbrush window next to it.
The problem with the Mac UI is that the app's menubar can only be accessed by the mouse (can't remember what accessibility-enabled mode would allow).
Under Windows, one can access the app's menubar by pressing the ALT key to move focus up to the menubar and use the cursor keys to navigate along the menubar. If you know the letter associated with the top-level menu (shown as underlined), then ALT-[letter] would access that top-level menu (typically ALT-F would get you to the File menu). So the Windows user wouldn't have to move the mouse at all, Fitt's Law to the max (or is it min? whatever, it's instant access).
For the ultrawide monitors these days (width >= 4Kpx), if you have an app window maximized (or even spanning more than half the screen), accessing the menu via mouse is just terrible ergonomics on any major OS.
Since OS X 10.3 (2003) Control+F2 moves focus to the Apple menu. The arrow keys can then select any menu item which is selected with Return or canceled with Escape. Command+? will bring you to a search box in the Help menu. Not only that, any menu item in any app can be bound to any keyboard shortcut of the user's choosing not just the defaults provided by the system or application.
AFAIK Windows 3.x flipped a bunch of Mac decisions to avoid being sued and then MS felt that they had to keep those choices forever for backwards compatibility.
And in my experience, when people moved from Windows to the Mac they're so annoyed that there are differences. When I try to explain that these were present in the Mac long before Windows, people start to understand.
You can generalize this observation to a lot of Microsoft's decisions: a problem exists, so they solve it in a nifty way, a way that makes everything else harder or more error prone. An example: byte order mark. That sure does solve the problem of UTF-16 and UTF-32 byte order determination. It makes every other use of what should be a stream of bytes or words much harder. Concatenate two files? Gotta check for the BOM on both files. Now every app has to look at the first bytes of every "text" file it opens to decide what to do. Suddenly, "text" files have become interpreted, and thus open to allowing security vulnerabilities.
> So it looked like the Mac but was infinitely worse.
"Infinitely worse"? Some people really need to cool off the hyperbole.
Having each window be a self-contained unit is the far better metaphor than making each window transform a global element when it is selected. As well as scaling better for bigger screens. An edge case like that may well be unfortunate, but it could be the price you pay to make the overall better solution.
That was the point of Tog's conclusion: edges of the screen have infinite target size in one cardinal direction, corners have infinite target size in two cardinal directions. Any click target that's not infinite in comparison, has infinitely smaller area, which I suppose you could conclude is infinitely worse if clickable area is your primary metric.
This wasn't just the menu bar either. The first Windows 95-style interfaces didn't extend the start menu click box to the lower left corner of the screen. Not only did you have to get the mouse down there, you had to back off a few pixels in either direction to open the menu. Same with the applications in the task bar.
The concept was similar to NEXTSTEP's dock (that was even licensed by Microsoft for Windows 95), but missed the infinite area aspect that putting it on the screen edge allowed.
The infinitely worse part was when you maximized the window so the menu bar was at the top, but Windows still had the border there, which was unclickable.
So now you broke the infinite click target even though it looked like it should have one.
> So it looked like the Mac but was infinitely worse.
On single monitor setups maybe: but on early OS X multi-monitor setups, you then had the farcical situation where the menu would only be shown on the "primary" display, and the secondary display didn't have any menu at all, so to use menus for windows that were on the secondary display, you had to move the cursor onto the other primary display where the menu was for all windows (or use keyboard shortcuts).
I think 10.6/7 (not sure exactly) was when they started putting the menu bar on both displays rather than just the primary.
By the 1990s, you didn't need AC, but your home/rental was more appealing if it had one, because there were a few hot days a year that were pretty uncomfortable otherwise.
Now, you can't not have it. There are far too many hot days to live without it.
reply