Thanks to LLMs, we are quite close to achieving this. I can write code in Python (sometimes even plain english), and GPT can convert it to Go or even Haskell if I like. The conversion is accurate 95% of the time on the first attempt in my use cases, and I expect this to improve further with more powerful models in the near future.
You can take it a step further, LLMs can already "execute" arbitrary non-existent languages, with non-existent data. Here are a couple of examples using a tool I wrote[1]:
% echo "nums 1 10 | filter even | to_words | map uppercase" | refab imagine
TWO
FOUR
SIX
EIGHT
TEN
% echo "with file '/tmp/top-ten-most-populous-cities.txt' do; cities = read; cities.each { |city| (city.name, city.utc_offset) }" | refab imagine
Tokyo, 9
Delhi, 5.5
Shanghai, 8
São Paulo, -3
Mumbai, 5.5
Mexico City, -6
Beijing, 8
Osaka, 9
Cairo, 2
New York, -5
For what it's worth, the tool isn't specialized for this, 'imagine' is just one of many prompts it can execute.
Of course the execution is non deterministic and at the moment only works for simple things, but you can imagine as LLMs get more capable and more integrated with tools this will matter less and less.
I'm a strong advocate for writing tests first, maintaining a robust QA process, and ensuring all merge requests undergo peer review. Since I started using GPT-4 for coding, the quality of my merge requests has remained the same. The difference is that I can now produce about twice as many merge requests, as GPT-4 handles the boilerplate writing, Stack Overflow searches and helps to start get going in case of a mental blockade or missing idea.
By what metric do you measure quality? Plenty of gpt users were dumping low qualify contributions prior to gpt, now their contributions are the same idiotic trash but appear more competent…
This is how I feel about cars that can drive themselves 99.9% of the time. The remaining 0.1%, the most difficult of corner cases, are to be handled by the human who has no experience driving even in good conditions.
I largely agree, but I don't think the current experience is the right one.
I recently started writing a game in Godot. I don't know GodotScript, and I've found I don't like it very much in trying to learn. I turned to aider.chat to see if I could describe the functions, data structures, and systems I wanted and have it write them. I also tried writing in a more familiar language (...one with braces...) and having it translate those files.
It does pretty well, but it doesn't feel like software engineering. It's too hands-off and doesn't activate the same neurons. All the problem-solving and puzzle-solving is gone, and the successes are quite boring, and the failure modes are more irritating even if they're necessarily quicker to solve.
It's a weird experience. I'm moving so, so much faster than I would have on my own, but I don't enjoy it. It feels like cheating - I'm not actually ashamed of what I'm doing but I also won't take credit for writing the code.
However, what I'm getting at is this: If I could write the code in a syntax or even language that I prefer and have copilot or whatever translate it in near-real-time (without active prompting), that would be the best of both worlds. I'd still be a little sad at myself if I didn't learn the new language, but I also think this method would facilitate learning better than what I'm doing with aider (because I could see what my code turns into as I'm writing it, and learn that "translation").
I can confirm that it is a suitable use case for GPTs. I do GPT-assisted programming language design and experimentation. In some cases, GPT-4 can even generate a basic interpreter that allows me to test my new language.
Here is an example of GPT's output for Python with braces that was generated after just spending 10 seconds for the prompt:
def preprocess_braces(code: str) -> str:
lines = code.split('\n')
processed_lines = []
indent_level = 0
indent_str = ' ' # 4 spaces for indentation
for line in lines:
stripped_line = line.strip()
# Check for opening brace
if stripped_line.endswith('{'):
processed_lines.append(indent_str * indent_level + stripped_line[:-1].strip() + ':')
indent_level += 1
# Check for closing brace
elif stripped_line == '}':
indent_level -= 1
else:
processed_lines.append(indent_str * indent_level + stripped_line)
return '\n'.join(processed_lines)
# Example usage:
code_with_braces = """
def example_function() {
if True {
print("Hello, world!")
}
for i in range(5) {
print(i)
}
}
"""
processed_code = preprocess_braces(code_with_braces)
exec(processed_code) # This will execute the transformed Python code
print("Processed Code:\n", processed_code)
The model doesn't "understand" anything, but if you, the programmer, understands it well enough, that can be enough to direct it to find a solution.
You can do a lot with ChatGPT or Claude if wrong code is easy to spot (which will obviously depend on what you're working on). If you can easily spot mistakes these things can often come up with a fix once you point it out. I've had some real success converting small-scale production C++ code into Python using Claude. Stuff that isn't really deep or complicated, but it's still faster and less annoying using an LLM to assist. I am sure there are large domains where it's crap, but for relatively simple stuff (CRUD logic, simple file parsing) it does remarkably well.
> The model doesn't "understand" anything, but if you, the programmer, understands it well enough, that can be enough to direct it to find a solution.
This is exactly my point.
Whether a programmer uses past experience exclusively to author source code or a statistical code generator (LLM) and then their learned ability is orthogonal to my original premise.
Without understanding, statistical code generation is
little more than a popularity contest.
Code is not short for encoding a solution. Code is any kind of program whether correct or not.
You can have a solution without understanding it. You can for instance know the rough shape a solution should have and try to guess at the details. And get it correct some of the time.
And current models do have some form of understanding, although it is sometimes incomplete. They are clearly able to solve many problems after all.
Just this morning I asked GPT-4o to write some code for me. And the code was correct, except for one stupid mistake GPT-4o made which caused a compilation error. I just fixed that myself, but I think it is likely if I gave GPT-4o the error message it could have fixed it too. And I was thinking if I set up a chain-of-thought agent with function-calling, GPT-4o probably could have discovered and fixed the compilation error itself without my involvement. Provide it with unit tests it may even get the code to pass the tests (even if it takes a few iterations)-which could address issues like the braces in strings and comments issue you mention, assuming the unit tests cover them. And if they don’t-if you notice an issue via code inspection or exploratory testing, GPT-4o (in my experience) often does a decent job of “here is the code and here is a description of the bug, modify the code to fix it”. Of course, sometimes chain-of-thought agents get stuck and fail to progress, but something that quickly gives you the right answer 80% of the time can be a big productivity boost.
In my case earlier today, it helped that it was a relatively simple function and I gave it a rather detailed natural language spec of what I wanted it to do. I totally could have written it all myself, but writing a natural language spec and getting GPT-4o to translate it to code is (depending on my mood) less mental effort than just writing the code directly.