Basically, you're going to try to emulate other instructions that you don't have with this one instruction, and that's not going to perform very well because now, instead of many optimized instructions, you have strings of this one instruction in its place. And I don't see any way to parallelize this: you're doing the same thing you always were, just with a bunch more code.
The first number is zero, right? Then the first letter of the alphabet is well its hard to show because it doesn't print. It's just an empty set, nothing, a stop bit. We actually do use only 1 bit in digital cpus, but a weird mix of analogue in broad band transmission. I wonder why cpus don't use ternary or whatever. But I wonder why asynchronous CPUs didn't take off, so don't mind me, just being bored.
No, modern GPUs run many of the same instructions as CPUs. They have branches and everything. Their main limitation is that groups of threads are bundled together (called warps) and share a program counter, so lots of branching can result in a lot of wasted work if the threads disagree on which branch to take. That, and there's a huge penalty you pay for moving data across the bus to GPU RAM.
few problems are easily parallelizable. that said, that's not even the issue here. specialized instruction may be emulated by movs, but the speed loss could never be recouped even by massive parallelization.
The problem is probably the address space that movs use, instead of specialized registers with optimized pipelining. But internally, many instructions might actually come down to conditional moves. I guess that's either after the microcode is decoded, or if I guessed wrong about that, then Register Transfer Logik still pretty much sounds like it was based on, well, transfers.
You can perform multiplication by repeated addition, but that is a very inefficient way to multiply. It's the same thing here, where you can replace other instructions with MOV, but the replacement is much slower than the original.
What makes you think this would be easier to parallelize than a traditional application? Just because there is only one kind of instruction used doesn't mean they don't still have to come in the right order!
I don't see any material online indicating that programs written for one-instruction-set-computers are more parallelizable than programs written for traditional computers. In fact, here is someone claiming the opposite:
> The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism.
If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.
Edit: can someone explain why this is being down voted please, because this is a legit question