> I don't follow you, what do computed gotos have to do with threaded code?
1. Build a jump table
2. Read a bytecode
3. Replace all the instruction codes in the bytecode stream with values from the jump table (i.e., build a threaded code from your bytecode).
4. In a loop, take the next value at PC and jump to it - it's already an address. No table lookups (just once, when you load your bytecode). So it's much faster than any switch can be.
EDIT: probably you did not notice that for most of the targets jumptbl_base = 0, and an indirection is removed.
Oh I think I get it, thanks (although I still don't understand what "threaded" means in this context).
The only downside I can see is that you'd significantly increase the size of the code. On a 64bit architecture you trade each byte for a 64bit address, effectively multiplying by eight the footprint in the data cache. A 256 entry LUT on the other hand will fit snugly in cache and the lookup shouldn't be very costly.
Also if I understood you correctly what you're proposing doesn't have much to do with the "computed gotos" extension.
> The only downside I can see is that you'd significantly increase the size of the code
This is why adding the jumptbl_base on 64-bit platforms. Pointers are still 32-bit, just with an added offset (and a tiny overhead).
> doesn't have much to do with the "computed gotos" extension
You cannot implement indirect threaded code without a computed goto. You can implement a direct threaded code, of course (generating jump or call instructions directly), but this is a totally different thing.
Why do you want any inlining? It's a threaded code. You know an address of each instruction handler, so you can replace an opcode with this address and eliminate a switch and any table lookups altogether.
1. Build a jump table
2. Read a bytecode
3. Replace all the instruction codes in the bytecode stream with values from the jump table (i.e., build a threaded code from your bytecode).
4. In a loop, take the next value at PC and jump to it - it's already an address. No table lookups (just once, when you load your bytecode). So it's much faster than any switch can be.
EDIT: probably you did not notice that for most of the targets jumptbl_base = 0, and an indirection is removed.