Why haven't Single Instruction-Set architectures taken off? If mov is Turing Com...

saagarjha · on Jan 24, 2019

They’re impractical and massively inefficient?

_ofdw · on Jan 24, 2019

Okay?

I'm asking because I don't know; can you elaborate on why that is?

Aren't most modern GPUs similar in that they're designed to just shit out triangles as fast as possible, massively parallel?

saagarjha · on Jan 24, 2019

Basically, you're going to try to emulate other instructions that you don't have with this one instruction, and that's not going to perform very well because now, instead of many optimized instructions, you have strings of this one instruction in its place. And I don't see any way to parallelize this: you're doing the same thing you always were, just with a bunch more code.

md5person · on Jan 24, 2019

> "can you elaborate on why that is?"

Might as-well suggest we use a single letter, instead of 26.

posterboy · on Jan 26, 2019

     .....   ..  ..
    ..   ..  .. ..
    ..   ..  ....
    ..   ..  .. ..
     .....   ..  ..

The first number is zero, right? Then the first letter of the alphabet is well its hard to show because it doesn't print. It's just an empty set, nothing, a stop bit. We actually do use only 1 bit in digital cpus, but a weird mix of analogue in broad band transmission. I wonder why cpus don't use ternary or whatever. But I wonder why asynchronous CPUs didn't take off, so don't mind me, just being bored.

Sohcahtoa82 · on Jan 24, 2019

.... --- .-- .- -... --- ..- - - .-- ---?

munk-a · on Jan 24, 2019

............... .............. .....

............. ......... ....... ........ ....................

....................... ............... ................. ...........

chongli · on Jan 24, 2019

No, modern GPUs run many of the same instructions as CPUs. They have branches and everything. Their main limitation is that groups of threads are bundled together (called warps) and share a program counter, so lots of branching can result in a lot of wasted work if the threads disagree on which branch to take. That, and there's a huge penalty you pay for moving data across the bus to GPU RAM.

stefs · on Jan 24, 2019

few problems are easily parallelizable. that said, that's not even the issue here. specialized instruction may be emulated by movs, but the speed loss could never be recouped even by massive parallelization.

posterboy · on Jan 26, 2019

The problem is probably the address space that movs use, instead of specialized registers with optimized pipelining. But internally, many instructions might actually come down to conditional moves. I guess that's either after the microcode is decoded, or if I guessed wrong about that, then Register Transfer Logik still pretty much sounds like it was based on, well, transfers.

CyberDildonics · on Jan 24, 2019

Why would it make sense to use a single instruction?

(Also no, GPUs are not the same at all)

kej · on Jan 24, 2019

You can perform multiplication by repeated addition, but that is a very inefficient way to multiply. It's the same thing here, where you can replace other instructions with MOV, but the replacement is much slower than the original.

shawnz · on Jan 24, 2019

What makes you think this would be easier to parallelize than a traditional application? Just because there is only one kind of instruction used doesn't mean they don't still have to come in the right order!

stcredzero · on Jan 24, 2019

What makes you think this would be easier to parallelize than a traditional application?

Exactly such an idea was proposed for parallel signal processing quite a while back, actually. Look up One Instruction Set Computer.

shawnz · on Jan 24, 2019

I don't see any material online indicating that programs written for one-instruction-set-computers are more parallelizable than programs written for traditional computers. In fact, here is someone claiming the opposite:

> The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism.

https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...

stcredzero · on Jan 24, 2019

Also note that the idea didn't take off.

stcredzero · on Jan 24, 2019

If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.

Actually, One Instruction Set Computers were proposed for this very application! (Also mentioned elsewhere, but is applicable in a serious way here.)

https://en.wikipedia.org/wiki/One_instruction_set_computer