Yes, that python patch is only halfway to a proper indirect threaded dispatch, it still got a lookup table. See the OCaml bytecode VM for a really high performance example of this approach.
> a more complex VM possibly operating on an abstract syntax tree
We're not talking about JITs here. Of course an tree-based representation is better for JITs, and if using a flat VM you may have to promote it to a higher level AST (or at least an SSA-based) form anyway.
But for an immediate execution, a flat stream of instructions (or, better, a threaded code) is much more efficient than any kind of tree. Yes, I'm aware of things like SISC, but I'm yet to see a proof that their approach is more efficient than a flat bytecode, even when JVM overhead is taken into consideration.
> a more complex VM possibly operating on an abstract syntax tree
Don't do it.