I hadn't realised people would be so interested ! I'll try and answer some of the questions that have been asked.
I just use a single type of MOSFET, the 2N7000.
I did look at BJTs (in fact the whole thing started because I wanted to teach myself about "proper" transistors) but working out what resistor values to use to cope with varying fanouts/ins etc was too scary for me. One of the fundamental drivers when making design choices was that I really really want it to work when its built, it'll be heartbreaking if/when it doesn't, so I try to favour robustness.
The single MOSFET is also, I'm currently guessing, the cause of the slow speed. If you look at how the professionals (chip makers) do gates they tend to have two versions of the logic, one driving the positive case and the other driving the negative using both P and N MOSFETS. So the output is actively driven either way. My circuits only actively drive in one direction and rely on a pullup resistor (the 10ks) to generate a 1 output. Halves the number of transistors I need, but cripples the speed. I need to do some experiments to prove/disprove that.
I am planning on putting the design files on the website.
And I'm in Cambridge UK as opposed to Cambridge MA.
This would be perfect for the computer history museum [1].
For a long time (perhaps they still do?) they had a working version of the babbage engine, a circa 1850s six ton mechanical calculator built to solve polynomials [2]. I can't overstate how cool it is; if you're in the neighborhood, they used to do daily demonstrations. Here's one such video [3]
A small hand-built computer on breadboards you can walk into would seem to be right up their alley.
I have a unicycle ! But never mastered it. Furthest I've managed was a metre or so and most of that was falling. But if I put the processor on casters it might support me as I pushed it along on the unicycle. Got to be worth a try...
I did do a lot of simulation of the logic. I built models of the hardware in software (C++).
But I didn't do any analog simulation, which was a mistake. I was aware of SPICE and had a look around but the tools I found seemed expensive. It was only after I'd started that someone pointed me at some free versions, LTSPICE in particular.
Why is analog simulation needed? Is it mostly for timing issues? Did you have to use any passive components in your design (like resistors or capacitors)?
A couple of reasons. Firstly its dynamic behaviour, If I'd known about the 1us gate propagation time earlier I may have tweaked the componentry a bit. The second reason is that it may have picked up some of my mistakes such as transposing the outputs or missing off some pullup resistors.
The cost is a rather embarrassing £20k.
The biggest cost is the circuit boards about 7 or 8k now I think.
The simple hardware such as the aluminium, nuts and bolts etc was a few thousand.
Cables and connectors were thousands. I've had to get 300m of ribbon cable so far and its not going to be enough ! There's about a thousand connections.
The proper electronic components transistors/resistors/LEDs not as dominant as you might expect.
There seemed to be just an endless amount of stuff that was needed.
I think relays are a lot slower in general. They are more binary as well, being either ON or OFF, where a transistor is actually amplifying a signal based on its input (so it's possible to have a transistor output a range of voltages, between the current of its main source and pullup resistor being used to drag the off state to 0). Not sure how that affects in total, but I think using transistors lets you vary the voltage needed to drive the system without switching out all the transistors.
I never really thought of using relays, just not into them. There's a chap called Harry Porter someone mentions below who has built a processor out of relays which looks very impressive.
My starting point was that I was trying to learn about transistors and was looking for ideas of something to build.
Not at the moment, I'll try and remember to have a look at that next time I test a module. Currently everything is shuffled into the corners so I can start the construction of the decoder frame.
Fantastic work, this is an amazing project. I think myself and every other Cs / electrical eng student has dreamed of this at one time or another. It's a gigantic undertaking though and it's exciting to see it done.
The implications for teaching are great too. Having this physical reference would really cut the learning curve in computer architecture.
I know in our Computer Architecture class we designed a MIPS-like processor. I don't remember what software we used, but we ran it in simulation and imaged it on to an FPGA board at the end of class. We learned how to build all the components from smaller (maybe slightly higher than transistor level logic) parts. It was a great learning experience.
Then we built a micro controller mimicking some simplified early ARM designs in year 2 (ARM was very widely used at the school, I guess the fact that Steve Furber works there played a role). Starting with ALU, decoding, etc. Was pretty awesome :)
In year 3, we built a simple VGA chip and uploaded it to an FPGA with a monitor connected to it. It could only draw rectangles, lines and circles but seeing it actually working was totally amazing. Definitely the best project while at the school.
We were using Verilog and Xilinx toolchain, and that happened at the University of Manchester.
Yeah its funny how alumni play the a role in the tech that gets used. I know that there was a lot of Xilinx in use on campus. I didn't do anything with FPGAs outside of the one class so I wondered how much of that was based on technical merit, etc. vs the fact that one of the Xilinx co-founders was a Rose-Hulman Alumn.
That class and Xilinx still haunt me. My team failed that project. Two of my three teammates couldn't grok Xilinx and gave up. That class catalyzed my decision to transfer out of Rose-Hulman.
>> Having this physical reference would really cut the learning curve in computer architecture.
I don't know, i think having a language that supports mixing high and low levels of description to describe a processor and the tools to simulate and play with it could offer much faster learning, since you could focus on learning.
Verilog and VHDL are substantial learning curves in themselves. We could really do with a newbie-friendly alternative; I don't know whether http://www.myhdl.org/ might be it.
Verilog and VHDL already simplify hardware design a lot. They seem to have a great learning curve due to the nature of hardware design.
Most people gain a lot of confidence in software development and try to design hardware like they would program a system. And then they complain that Verilog and VHDL is too complex.
Verilog in particular leads you into that trap, though. Because you can write conventional sequential-execution programs in it, and usually have to when writing testbenches. The nomenclature of "process" and "task" imply they behave like software - and they do, in the simulator. Then there are the hoops you sometimes have to jump through in order to get the synthesis to behave as you want: "reg" is not always a D-type flip flop, and is mandatory in some places where it doesn't synthesise to one.
The challenge of Verilog & VHDL is not learning the language, it's learning the paradigm of hardware & HDL. Everything is parallel, nothing is sequential unless otherwise specified.
In themselves they are very simple, basic languages.
To be fair, it's Hardware Description Language, not Hardware Synthesis Language. HDLs are mostly tools to support V&V, it's just that synthesis is the most convenient way to ensure that an implementation is consistent with the HDL description.
Yeah, I believe originally all the HDLs were intended for simulation of designs which were then implemented by hand. Neither Verilog nor VHDL was processor designed for synthesis, which is probably why it's so quirky.
Sort of kicking it old school. Timing will be fun, there is a notion of 'lumped' versus not lumped circuits (and this is especially true of logic) where you want the entire circuit to fit within one wavelength of the fastest clock (where that is determined by the speed of light) otherwise you end up with really hard to debug timing skew problems. At 9 meters your CPU will be challenged to run at > 33 Khz but you will be able to do better than that with good layout and floor planning.
Quite the project, I look forward to the final report.
Agreed, amazing project. One of his notes further in is an early performance test on the ALU. He mentions that the fact he's implemented a simple ripple adder design seems to indicate a maximum clock speed of no more than 25kHz.
With extremely long unshielded parallel cables you're going to tons of cross talk and other emissions that will cause bitflips / errors.
In addition at 33Mhz the speed would be (using speed of light instead of calculating velocity factors):
(the speed of light * 1) / (33 mhz) = 9.08461994 meters
per cycle, which isn't taking into account the capacitive load of the wire and registers at the other end (which will act as a big filter). You need to meet some sort of timing constraints in order to count for clock sync issues etc...
Modern CPUs can run faster since the capacitive load of a transistor is extremely small (since they're nanometers in size), plus the loop sizes of the interconnects are extremely small, lowering EMI emissions.
I didn't count the zeros accurately as I was typing on the train, the actual number I had estimated was about 330kHz. As others have mentioned there are several factors at play here, but the three are particularly troublesome, the speed of propagation of a signal through a copper trace, the network response of long traces to a step function such as a digital signal transition, and the loss of signal integrity due to radiation of an unshielded trace over long distances.
The first one is pretty easy to understand, the output of a gate changes state and some number of nano-seconds later that output has propagated to the end of the trace. If you're going to do something like "load register" from a bus, when you latch the register flip flops you need to make sure that the input signals are stable (called the 'setup' time) and that after the clock fires to latch the signals that they stay stable to be reliably transferred into the flip flops, the 'hold' time. These values vary for different logic families, and within families vary by temperature and voltage, so a typical timing analysis will include all for 'corner' cases of (high/low voltage, high/low temperature). The whole system can only go as fast as its slowest "setup + hold" time period. And for more complex logic that can "stack" so in the case of the ripple adder, each adder has a setup and hold time before its output is "accurate" and then it feeds into the next bit which has a setup and hold time as well, so you're total time in that adder is going to be the sum of all those setup and hold times. And before you start the add process you need to have the addends loaded into their registers (another setup and hold), and when the add completes the result has to end up in the destination register (another setup and hold). Now typical micro-architecture will have each one of those on an internal clock so the input clock is divided by "n" (where n is 2, or 4, or 8) and on each phase the next thing happens so phase1 things latch into the operand registers, phase 2 they are transferred to the inputs of the ALU, phase 3 they are latched into the destination register.
The second one takes into account that the logic is actually analog (there is really no such thing as "digital" logic) so when a gate changes state, what it really does is start driving (or sinking) a current into a trace to push (or pull) that trace into a ground state or a voltage potential state. Most logic will have a "range" of what they consider one or the other. The waveform on an oscilloscope will show a ramp based on the LC "network" the gate driver sees on that trace. If it is unterminated (which means the trace is not resistance controlled) the signal may rise rapidly or slowly, it may "ring" a bit or not at all and then settle down to the final value. On signal lines that terminated there are a pair of resistors which cause the entire trace to appear as an RLC 'tank' which is tuned to the nominal operating frequency of the clock. Making for nice predictable corners. But the cost of that predicability is that the signals take time to change state so propagation time is slower still.
Finally, and this is the one where you get to put your AM radio next to a computer and listen to the "music" it is creating, each of those unshielded traces are essentially a tiny antenna. And the longer the trace, the more likely it is to radiate rather than propagate the signal. There are particularly challenging issues around even fractions of a wavelength. You can ignore transmission line effects for short distances, you can shield longer traces (put a ground trace on either side) to mitigate losses, and you can reduce the operating frequency.
There are couple of discussions about this in the Horowitz & Hill "Art of Electronics", the DEC book "A guide to DEC hardware" and in a number of digital design text books (I don't know if current ones talk about it but the ones from the 80's did).
It is one of the more interesting aspects of large systems digital design.
Thanks for the nice explanation. It makes one appreciate the amount of complexity of designing a processor with 8B transistors which must operate error-free at 1GHz.
Compared to that, designing a human brain to operate at 100Hz and not requiring much precision, is a piece of cake!
Error corrected, matched pair CAN bus maxes out at 500 Khz; this design has nothing with matched termination and different cable lengths, so is obviously going to be much worse. If you're curious about how much worse, start here:
https://en.wikipedia.org/wiki/Reflections_of_signals_on_cond...
I think it was just lowering the number by a few orders of magnitude to compensate for the effects and logic propagation delay (e.g. I think he said he's using a ripple adder which has a critical path dependent on the bit length), not a precise calculation exact number. My gut instinct says it's probably the right order of magnitude though.
You could probably run the long stretches at a much higher clock rate or make the design run on asynchronous clock domains if you use good serialize/deserialize units.
Not nearly as large or ambitious, but if you're into the "solder hundreds of parts together to get something you could buy for a dollar" aspect, the "Transistor clock" is a pretty cool kit. I'm a lot better at soldering, now, too.
It looks like they built the working circuit first, and then just added a load of diodes that don't do anything. I would imagine you could leave it running while adding components so if you made a mistake you immediately know, and can just undo it.
From the page: Every single part that composes the clock has its purpose. If you would decide to take out a single part of the circuit the clock won't operate properly anymore.
You can visit it in the museum. Forgot the name, has a plane on the roof.
He gave a talk when I was there, and afterwards, we where allowed to add some numbers on the computer. You can see the memory banks and hear the relais clicking and blinking.
An interesting fact is that it is a 32 bit floating point machine. Though when I was there he had some issues with synchronising the clocks, so it only did integers then.
A word of warning: you could easily spend days exploring that museum. Fortunately, the computers are close to the entrance, so you probably won't make the mistake of discovering them at the last minute.
One of my college profs worked with Zuse. In class I said, "Wait, so you worked for the..." and then stopped realizing what I was about to say. He was Hungarian-American, and at the time very old. What I was going to say wasn't fair.
On the other hand, the class he was teaching was my assembly language class. And he knew the material inside and out.
> While Zuse never became a member of the Nazi Party, he is not known to have expressed any doubts or qualms about working for the Nazi war effort. Much later, he suggested that in modern times, the best scientists and engineers usually have to choose between either doing their work for more or less questionable business and military interests in a Faustian bargain, or not pursuing their line of work at all.[28]
I'm not sure I understand what you were going to say. I was assuming "Wait, so you worked for the inventor of the computer?", but then you went on about race and I didn't get it anymore.
That's not really a question. He absolutely did work for the Nazis. The OP's point was that there wouldn't be much value in simply pointing that out (it's not like it would be news to Prof. Zuse).
Yes. And realizing mid-sentence that he would have been a very young Hungarian at the time and it would have been a complicated and probably not fair to him question.
TTL logic? I see 10K and 470 ohm resistors in the pix, or the color codes are messed up by the camera, and 5V supplies and not a diode in sight and the layout doesn't look RTL to me. This makes the project interesting because "most" retro builders do DTL. So that's cool. Takes a thundering lot more transistors (admittedly no diodes) and the noise budget and fanout aren't as easy but it'll work... I suppose from a parts minimization standpoint you could use the inherent diode in a bipolar transistor to build DTL using only transistors.
AFAIK (and I've studied this for awhile) no one has built anything substantial at the transistor level using CMOS architecture. Virtually everyone does DTL, there's plenty of relay based work, this guy is the only discrete TTL family I'm aware of, and I've seen a little RTL logic family out there. That would be interesting.
I always thought the totem output stage of a TTL gate would be "too hard" compared to the other logic families so I got to hand it to this guy, impressive.
A better "vital statistics" comparison would be the DTL logic straight-8 original DEC PDP8 which would vaguely fit on a desk and used about 1500 transistors, 10K or so diodes. It looks like this:
In my infinite spare time I'm going to build a straight-8 using all SMD components (so my cards will be half business card sized rather than half a sheet of paper sized). I'm sure I'll get to that right about 2080 or so at this rate. The advantage of cloning an existing architecture is vast piles of software ready to use.
The disadvantage of using modern high beta, high Ft transistors instead of 60s era transistors is I'm likely to build some great VHF/UHF oscillators instead of logic gates. OP seems to have gotten past this problem, or hasn't run into it yet.
WRT moving parts and "make it like a book" comments, the last thing you want with 50K or so wire jumpers is movement, even if every bend only breaks 0.01% of wires per "fold" that going to multiply up into pain if you swing 10K wires open and closed 1K times. Ribbon cable and standard connectors could help.
I wonder how suitable commercially available discrete MOSFETs are for building logic? They're all made for power applications. Of course, it's not like discrete BJTs are "suitable" either.
Reminds me a bit of the BMOW (Big Mess O' Wires) (http://www.bigmessowires.com/2009/02/02/wire-wrap-photos/), which was/is a functional desktop computer made of hand-wrapped wire on a 12x7 wire-wrap board. Of course the Mega-processor is on a much larger scale, but in terms of unique hand-built computers I think they're comparable.
For those who want to learn how CPUs and computers work, Nand2Tetris is a great book that takes you from logic gates (NAND) to making a CPU that plays tetis
I just finished the course from coursera https://www.coursera.org/course/nand2tetris1. It is a very good start learning from the course or from the book for those who are interested to build something like this. I highly recommend it.
I want this guy to get some help and have some team build a JVM on his machine.
So I could play Minecraft, at about 1 FPH(frame per hour) and load that simulation of a 70s era CPU made from redstone. Google for "minecraft redstone processor" and watch the first youtube video.
How interesting! Just look at the estimated size of the complete megaprocessor. It wouldn't fit in any room at a regular home!
It's a really cool exercise, however, I imagine that James must spend about 99% of the time soldering, so it's more like an artisanal process than an engineering one. An amazing feat nonetheless.
Not in a straight line, but if you lined the walls of the room it should be able to fit in at least one room of a house. A 15x10ft room would probably have enough wall space.
I don't understand why separate frames cannot be connected into a "book". It would take less space and as far as I can tell, each frame is quite thin anyway?
He stated it is for educational purposes? I mean, you will not be looking at all frames at once? I mean, it would be definitely convenient to have easy access to all of them without manipulating the frames, but it seems as unnecessary requirement.
wow ! in an almost same vein: http://www.homebrewcpu.com, doesn't use discrete-transistors, but starts with 7400 series ttl chips. 'slightly' easier :)
edit: and this is with complete s/w "stack" as well, including ansi-c compiler, multi-user port of minix-2, tcp/ip stack...
I'm fascinated by projects like this. I'd never have the time or skills to pull it off myself, sadly. No one has yet mentioned the similar relay based computer by Harry Porter, which I'm sure has been mentioned on HN before, but not yet in this discussion.
When I saw the post, I immediately remembered this!
I saw it the first time when I was about 14 y/o, it inspired me to learn a bit about logic and EE, such as how can we build an AND block of relays, and how these blocks are connected together. Eh, good memories, thanks for sharing :)
Man I was scrolling and scrolling hoping this was from a few years ago and the whole thing is finished. Ah well, I will be surely glad to keep up with this project, because it is truly amazing. Simply waiting to have my mind blown by this is blowing my mind.
I had a boss once who recycled a bunch of arrays from an old phone switch and built a fully working CPU system with 12-bit word lengths and 1024 words of storage. Took days to run some things, but such things are a true joy to have on in the background during coding sessions.
I love this. Understanding a CPU gets more complex with each cache layer and each instruction pipeline, so building a model of the “pure, unadulterated principle” does make sense.
I have to confess that power efficiency wasn't/isn't top of my list. As it happens the parts you suggest don't feature on the Farnell website which is where I did my selection. I use the L-1334SRT. I can't now remember quite how I chose it but I'd have been looking for some combination of brightness and cheapness and availability. I would be budgeting 10mA per LEDs as that's the kind of number I was used to, if it didn't want more than that then that was all I asked on the power side of things.
There are a couple limitations, depending on if you mean "take a CPU and scale the components up" or if you mean "add more and more to a CPU until it becomes larger".
The main limitations are heat, frequency, reliability, and thermal stresses.
Heat being mainly for if you're adding more and more to a CPU. With just scaling a CPU up, the increasing size of everything counteracts the longer distances. But if you are trying to add more to a CPU the amount of power dissipation will continue to increase until eventually you can't get the heat out from the center of the CPU.
Ditto, the larger the CPU is the more you will have problems with thermal stresses. Once you get into macroscopic stuff, that is. At current CPU sizes you actually seem to end up with more problems with thermal stresses the smaller you get, due to wires scaling "funny" (edge effects becoming more dominant the smaller you get).
Ditto, if you're trying to add more and more to a CPU eventually you'll hit a point where you can't add more because you'll cost more time replacing bad components than actually running. Some of the early vacuum tube computers had actually hit this limitation, and it is only through heroic "throw more money at it" solutions that modern CPUs aren't limited by this. I mean: you have a billion transistors on a chip. And get decent yields.
Frequency is the big limitation. CPUs propagate signals at ~0.1c already - which means if you scale them more than 10x the size (or ~2x for PCB-style traces) you cannot maintain the same frequency. Remember: your frequency dictates how long the longest wire through the CPU that gets updated in a single clock cycle is. If your CPU is larger than c/f in diameter, you cannot send a signal to the other side of the CPU and back in one clock cycle - at which point it may as well be multiple CPUs.
The bigger you make it the slower it'll end up working, basically. The time for the signal to propagate through all those discrete transistors will be quite big and that puts an upper bound to your clock frequency.
Pretty awesome. However, I think that this is one spot where VR will become fantastically useful - suddenly we will be capable of becoming a few nanometers tall, and have the ability to visualize electricity flowing through ICs.
Thank you for saving me the job of linking to TNMoC, they have identical replicas, and occasionally originals, of exactly this kind of project i.e. build a macro-processor. Though most of them predate even transistors.
They have one that stores info in a ten-state neon tube which looks as cool as you might hope [see edit].
Turing didn't even have that, they used Acoustic Delay Lines, which could "store" a state for as long as it took a pressure wave to propogate through a long tube.
To "write" you do (or do not) send a pulse at the appropriate millionth-of-a-second time slice. To "read" at a certain time later (of the order of a thousandth of a second), listen at the other end in the appropriate millionth-of-a-second time slice for the slightly-degenerated pulse. To store for longer than a thousandth of a second, re-send any detected pulses, and remember which pulses are which.
The tube can be filled with anyhting cheap enough, thaat propogates signals as slow as possible, without degenerating them too much. Fun fact, when building the ACE Turing "rather hankered after using gin, which would come cheaper than mercury". Mercury was eventually chosen, hence they're often known as Mercury delay lines.
[edit]
The machine is called the Harwell Dekatron or WITCH. And it looks and sounds as impressive as its name. See a program being run on it at the link below. Demo starts at 13:00.
The spinning lights are the transistor equivalents. Each one able to be read which of the ten possible positions the light is in, and advanced one step.
Thank you so much for this, I've wanted to see something like this for a long time. I was hoping Steve Wozniak would real something like this in an unreleased book on how he made the mac.
Actually, it's much easier. Once you narrow down the buggy behavior sufficiently in software, you can literally go to the hardware with a voltmeter and measure exactly what is vs what should be, identify the offending part directly, and replace it using little more than fingers and a soldering iron. That vs "narrowing" it to a 100+ pin component that does a significant fraction of everything, with replacement being little more than throwing the whole unit away (you did want the newer version anyway, right? good excuse to get it now).
Back in the 1970s, I worked on the original (or nearly so?) TOW missile system at Emerson Electric in St. Louis. The electronics were composed of a number of encapsulated blocks of (what could be) crumbly black material. These cubes could be thought of as integrated circuits but were actually composed of discrete transistors, resistors, capacitors, etc. inside.
Not as ambitious but I often dealt with building computer systems, including the processor, from the TTL chip level back then. They were still sophisticated, along the lines of what would be called RISC architecture.
You really, really knew how computers and their processors worked back then and God how I miss it.
I just use a single type of MOSFET, the 2N7000. I did look at BJTs (in fact the whole thing started because I wanted to teach myself about "proper" transistors) but working out what resistor values to use to cope with varying fanouts/ins etc was too scary for me. One of the fundamental drivers when making design choices was that I really really want it to work when its built, it'll be heartbreaking if/when it doesn't, so I try to favour robustness.
The single MOSFET is also, I'm currently guessing, the cause of the slow speed. If you look at how the professionals (chip makers) do gates they tend to have two versions of the logic, one driving the positive case and the other driving the negative using both P and N MOSFETS. So the output is actively driven either way. My circuits only actively drive in one direction and rely on a pullup resistor (the 10ks) to generate a 1 output. Halves the number of transistors I need, but cripples the speed. I need to do some experiments to prove/disprove that.
I am planning on putting the design files on the website.
And I'm in Cambridge UK as opposed to Cambridge MA.