> But any sandbox we design won’t be good enough. This isn't really true. To bri...

> But any sandbox we design won’t be good enough.

This isn't really true.

To bring back your analogy of searching the space of all programs - suppose we wanted to simulate every turing machine of at most N states for some sufficiently large N. One of these turing machines is going to encode an AGI. Nonetheless, it is still just an encoded turing machine being simulated by a turing machine simulator. No matter what great intelligence is encoded by those states, it's still a state machine following the same rules as every other turing machine in existence. It's a fixed set of states following a fixed set of rules on an unbounded-length tape of memory.

The same is true for an AGI in a computer program in a modern computer. A complex modern neural nets are still just a program that takes a giant array of numbers and feeds it through a network of transformations repeatedly, yielding a new array of numbers. No matter what brilliant information that array of numbers encodes, the array of numbers is still just a bunch of bits somewhere in RAM. It's data, not execution. It can't invoke the IPC stack unless *we* program functionality in that interprets the data and executes functionality based on it.

It's like the brain in a vat[0] thought experiment. A super-intelligent brain in a vat can't escape the lab unless the lab researchers give the brain control over something physical. A super-intelligent brain can't even know that it's a brain in the vat unless it is told. Why then would you expect a digital brain in a simulated environment to even know that it is what it is, much less how to perfectly execute a bug in a simulator (which might provably not exist[1]) to send a copy of its source code to another computer and end up replicating and distributing itself?

The simple answer is: Sandboxing an AI is significantly easier than you seem to think. When an AI takes over the world, it's going to be because someone handed the AI root access to a big tech data center so it can optimize operations; not because the AI discovered the perfect sequence of zero-day bugs to overcome all of humanity's best cybersecurity efforts.

[0] https://en.wikipedia.org/wiki/Brain_in_a_vat

[1] https://en.wikipedia.org/wiki/Formal_verification