What is the advantage of that over programming languages though? At some point y...

simonw · 2026-01-10T23:43:28 1768088608

Because if the tests are in Python the LLM still has to convert them from Python to Ruby or whatever, which leaves room for mistakes to creep in.

If the tests are in YAML it doesn't need to convert them at all. It can write a new test harness in the new language and run against those existing, deterministic tests.

roxolotl · 2026-01-11T00:01:57 1768089717

My point is that to create a specification you need to use a formal language of some kind. In this example they created a new yaml based specification language. Why do that vs use a well documented existing formal language the LLM knows well like Python. The translation is either yaml -> new language or Python -> new language. The translation is happening in both cases.

The advantage I can think of is it would might be more human readable but Python is damn close to pseudocode. It’ll likely always be a bit annoying to write because it has to be a formal language.

simonw · 2026-01-11T01:51:05 1768096265

There's no translation from YAML to a different language.

The YAML describes the tests - like this file here: https://github.com/dbreunig/whenwords/blob/main/tests.yaml

Snippet:

  - name: "5 hours ago"
    input: { timestamp: 1704049200, reference: 1704067200 }
    output: "5 hours ago"

  - name: "21 hours ago"
    input: { timestamp: 1703991600, reference: 1704067200 }
    output: "21 hours ago"

When told "use red/green TDD to write code for this in Ruby", a coding agent like Claude Code will write a test harness in Ruby that loops through all of those YAML tests, run it and watch it fail, then write just enough Ruby that the tests pass.

roxolotl · 2026-01-11T03:57:18 1768103838

Yea I guess we're having a definitional disagreement here. To be clear I think this is a good idea and the work you've done using tests from projects to have agents translate libraries is awesome.

But to me clearly that YAML snippet you provided is a specification which needs to be translated to Ruby as much as Python would. If the equivalent Python is:

def test_timeago_5_hours_ago(self):

  self.assertEqual(timeago(1704049200, 1704067200)), "5 hours ago")

def test_timeago_21_hours_ago(self):

  self.assertEqual(timeago(1703991600, 1704067200)), "21 hours ago")

The YAML is no more clear than the Python, nor closer to Ruby. Honestly I think it's less clear as a human reading it because it's hard to tell which function is being tested in context of a specific test case. I guess it's possible Claude is better at working with the YAML than the Python but that would be a coincidence I think.