More

simonw · 2026-01-10T22:48:56 1768085336

Looks like it's no network activity for 30 seconds.

simonw · 2026-01-10T22:04:13 1768082653

You can define them in a structured way that's not tied to a specific programming language. Imagine a test suite that's entirely YAML inputs and outputs, or JSON, or even CSV.

The key idea is to have one test suite/specification that multiple implantations in different languages can share.

roxolotl · 2026-01-10T22:33:00 1768084380

What is the advantage of that over programming languages though? At some point you’re just creating a new specification language which needs to be learned. If an LLM can go from English spec to Python unit tests why not just start with, or at least distribute, Python unit tests. A programming language will allow you to be significantly more correct and consistent than English.

simonw · 2026-01-10T23:43:28 1768088608

Because if the tests are in Python the LLM still has to convert them from Python to Ruby or whatever, which leaves room for mistakes to creep in.

If the tests are in YAML it doesn't need to convert them at all. It can write a new test harness in the new language and run against those existing, deterministic tests.

roxolotl · 2026-01-11T00:01:57 1768089717

My point is that to create a specification you need to use a formal language of some kind. In this example they created a new yaml based specification language. Why do that vs use a well documented existing formal language the LLM knows well like Python. The translation is either yaml -> new language or Python -> new language. The translation is happening in both cases.

The advantage I can think of is it would might be more human readable but Python is damn close to pseudocode. It’ll likely always be a bit annoying to write because it has to be a formal language.

simonw · 2026-01-11T01:51:05 1768096265

There's no translation from YAML to a different language.

The YAML describes the tests - like this file here: https://github.com/dbreunig/whenwords/blob/main/tests.yaml

Snippet:

  - name: "5 hours ago"
    input: { timestamp: 1704049200, reference: 1704067200 }
    output: "5 hours ago"

  - name: "21 hours ago"
    input: { timestamp: 1703991600, reference: 1704067200 }
    output: "21 hours ago"

When told "use red/green TDD to write code for this in Ruby", a coding agent like Claude Code will write a test harness in Ruby that loops through all of those YAML tests, run it and watch it fail, then write just enough Ruby that the tests pass.

roxolotl · 2026-01-11T03:57:18 1768103838

Yea I guess we're having a definitional disagreement here. To be clear I think this is a good idea and the work you've done using tests from projects to have agents translate libraries is awesome.

But to me clearly that YAML snippet you provided is a specification which needs to be translated to Ruby as much as Python would. If the equivalent Python is:

def test_timeago_5_hours_ago(self):

  self.assertEqual(timeago(1704049200, 1704067200)), "5 hours ago")

def test_timeago_21_hours_ago(self):

  self.assertEqual(timeago(1703991600, 1704067200)), "21 hours ago")

The YAML is no more clear than the Python, nor closer to Ruby. Honestly I think it's less clear as a human reading it because it's hard to tell which function is being tested in context of a specific test case. I guess it's possible Claude is better at working with the YAML than the Python but that would be a coincidence I think.

simonw · 2026-01-10T20:17:02 1768076222

I've been exploring this pattern recently too. Giving current coding agents an existing conformance or test suite and telling them to keep writing code unto the tests pass is astonishingly effective.

I've now got a JavaScript interpreter and a WebAssembly runtime written in Python, built by Claude Code for web run from my phone.

simonw · 2026-01-10T17:10:25 1768065025

I'm not crazy about the way this installs Deno as `/usr/local/bin/deno` (on Linux systems at least). I was hoping it would leave that executable tucked away in Python site-packages somewhere out of the way.

I also ran into some weird issues where sometimes the binary isn't executable and you have to chmod +x it - including in GitHub Actions workflows. I had to workaround it like this: https://github.com/simonw/denobox/blob/8076ddfd78ee8faa6f1cd...

    - name: Run tests
      run: |
        chmod +x $(python -c "import deno; print(deno.find_deno_bin())")
        python -m pytest

zahlman · 2026-01-10T23:16:37 1768086997

This shouldn't be possible as long as you get a wheel (try configuring the installer to require wheels). The download script in the source distribution can do what it wants, of course, and I agree that this isn't the greatest behaviour.

My guess is that they do this in order to put the binary in a specific location of a container, as part of their own build process. The ecosystem doesn't distinguish between "source distributions" intended to build on a user's vs. developer's machine.

simonw · 2026-01-10T16:37:56 1768063076

Here's the key idea:

> Creating any one single behavior in a computer system is almost always trivial for the experienced engineer. When the experienced engineer on your team says that something can’t be done easily, they almost always mean is that the thing can’t be done easily in a way that is acceptable to the health of the product. Junior engineers tend not to have to consider this constraint.

I completely agree. One of the things that makes a senior engineer senior is the ability to design and implementing code with the health of the overall system in mind.

This is really hard, especially since you simultaneously have to resist the temptation to build abstractions for a future that may not come to pass - sticking to the YAGNI principle https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it

simonw · 2026-01-10T14:01:18 1768053678

OK this is cool:

  uvx deno --version

One-liner to run Deno without a separate step to install it first.

The wheel comes in five flavors: https://pypi.org/project/deno/#files - Windows x86, manylinux x86 and ARM64, macOS x86 and ARM64.

That's a lot of machines that can now get a working Deno directly from PyPI.

zahlman · 2026-01-10T16:38:59 1768063139

There's also a "source" distribution that installs by downloading from the main project's GitHub release page. But that's presumably limited to the same platform support.

The yt-dlp project also raised concerns that the manylinux wheel incorrectly advertises older glibc support.

... but I always bristle a bit at the "one-liner to run without installing" description. Sure, the ergonomics are great, but you do still have to download the whole thing, and it does create a temporary installation that is hard-linked from a cache folder that is basically itself an installation.

simonw · 2026-01-10T17:08:39 1768064919

Sure, but as an end-user you don't have to think about installation at all. That's a huge win - mainly because it eliminates the "Did I install this already? Where did I put it? What's the command for doing that again?" mental overhead.

simonw · 2026-01-10T13:14:19 1768050859

They've got good at C now. I can't speak for ASM.

Here's a C session that I found quite eye-opening the other day: https://gisthost.github.io/?1bf98596a83ff29b15a2f4790d71c41d...

simonw · 2026-01-10T13:11:58 1768050718

> AI coders keep saying they review all the code they push

Those tides have shifted over the past 6 weeks. I'm increasingly seeing serious, experienced engineers who are using AI to write code and are not reviewing every line of code that they push, because they've developed a level of trust in the output of Opus 4.5 that line-by-line reviews no longer feel necessary.

(I'm hesitant to admit it but I'm starting to join their ranks.)

simonw · 2026-01-10T04:45:08 1768020308

Yeah that's about right.

It's a fast starting and fast pausing persistent VM, with a ton of built in developer tools (including a preconfigured Claude Code) and an extra JSON API for executing commands within it so you can treat it as a sandbox.

You may find my writeup here useful: https://simonwillison.net/2026/Jan/9/sprites-dev/

simonw · 2026-01-10T00:05:42 1768003542

I'm really excited about https://sprites.dev/ - it hits two of my favourite problems at once:

1. Developer environment sandboxes. This is a cheap and convenient way to run Claude Code / Codex CLI / etc in YOLO mode in a persistent sandboxed VM with a restricted blast radius if something goes wrong.

2. Sandbox API. Fly now have a product that lets me make a simple JSON API call to run untrusted code in a new sandbox. There's even snapshotting support so I can roll back to a known state after running that code.

I wrote more a bunch more about this here: https://simonwillison.net/2026/Jan/9/sprites-dev/

dang · 2026-01-10T22:25:10 1768083910

I know you know this, as you posted it, but readers might want to look at this related thread:

Fly's Sprites.dev addresses dev environment sandboxes and API sandboxes together - https://news.ycombinator.com/item?id=46561089 - Jan 2026 (10 comments)

realty_geek · 2026-01-10T15:59:07 1768060747

I have found container-use to be super useful for this.

https://container-use.com/quickstart

BTW Simon, I was super happy when I heard on Theo's podcast that he will be encouraging you to monetise your work more. I'm super appreciative of your work and I'm pretty convinced that the more you profit from it, the better the universe will be!!!