Microprocessor Design/Wire Wrap

Microprocessor Design

Historically, most of the early CPUs were built by attaching integrated circuits (ICs) to circuit boards and wiring them up.

Nowadays, it's much faster to design and implement a new CPU in a FPGA -- the result will probably run faster and use less power than anything spread out over multiple ICs.

However, some people still design and build CPUs the old-fashioned way. Such a CPU is sometimes called a "home brew CPU" or a "home built CPU".

Some people feel that physically constructing a CPU in this way, since it allows students to probe the inner workings of the CPU, it helps them "Touch the magic"^[1], helps them learn and understand the underlying electronics and hardware.

Overview

An example of a homebrew PC, click to view larger version

A homebrew CPU is a central processing unit constructed using a number of simple integrated circuits, usually from the 7400 Series. When planning such a CPU, the designer must not only consider the hardware of the device but also the instructions the CPU will have, how they will operate, the bit patterns for each one, and their mnemonics. Before the existence of computer based circuit simulation, many commercial processors from manufacturers such as Motorola were first constructed and tested using discrete logic. Those commercial processors include the Motorola 6800,^[2] the Motorola 6809,^[3] and the Hewlett-Packard PA-RISC TS1.^[4]

Although no limit exists on data bus sizes when constructing such a CPU, the number of components required to complete a design increases exponentially as bus size gets wider. Common physical data bus sizes are 1-bit, 4-bits, 8-bits, and 16-bits. Incomplete design documents exist for a 40-bit CPU.^[5] A microcoded CPU may be able to present a significantly different instruction set to the application programmer than seems to be directly supported by the hardware used to implement it. For example, the 68000 presented a 32-bit instruction set to the application programmer -- a 32-bit "add" was a single instruction -- even though internally it was implemented with 16-bit ALUs. For example, the Zilog Z80, one of the most commonly used CPU families of all time,^[6] presented an 8-bit instruction set to the application programmer -- even though internally it was implemented with a single 4-bit ALU.^[7]

For example, w:serial computers, even though they do calculations one bit per clock cycle, present a instruction set that deals with much wider words -- often 12 bits (PDP-8/S; PDP-14), 24 bits (D-17B), or even wider -- 39 bits (Elliott 803).

Notable Homebrew CPUs

The Magic-1 is a CPU with an 8-bit data bus and 16-bit address bus running at about ~~3.75MHz~~ 4.09 Mhz.

The Mark I FORTH also has a 8-bit data bus and 16-bit address bus, but runs at 1MHz.

The V1648CPU is a CPU with a 16-bit data bus and 48-bit address bus that is currently being designed.

APOLLO181 is a homemade didactic 4-bit processor made of TTL logics and bipolar memories, based upon the Bugbook® I and II chips, in particular on the 74181 (by Gianluca.G, Italy, May 2012).

Parts

...

chips

bus

Practically all CPU designs include several 3-state buses -- an "address bus", a "data bus", and various internal buses.

A 3-state bus is functionally the same as a multiplexer. However, there is no physical part you can point to and say "that is the multiplexer" in a 3-state bus; it's a pattern of activity shared among many parts. The only reason to use a 3-state bus is when it requires fewer chips or fewer, shorter wires, compared to an equivalent multiplexer arrangement. When you want to select between very few pieces of data that are close together, and most of that data is stored on a chip that only has 2-state outputs, it may require fewer chips and less wiring to use actual multiplexer chips. When you want to select between many pieces of data (one of many registers, or one of many memory chips, etc.), or many of the chips holding that data already have 3-state outputs, it usually requires fewer chips to use a 3-state bus (even counting the "extra" 3-state buffer between the bus and each thing that doesn't already have 3-state outputs).

A typical register file connected to a 3-state 16-bit bus on a TTL CPU includes:

octal 2-state output registers (such as 74x273), 2 chips per 16-bit register
octal 3-state non-inverting buffers (such as 74x241), 2 chips per 16-bit register per bus
a demultiplexer with N inputs (driven by microcode) and 2^N output wires that select the 3-state buffers of one of up to 2^N possible things that can drive the bus, 1 chip per bus.

Later we discuss shortcuts that may require fewer chips.

74181

Like many historically important commercial computers, many home-brew CPUs use some version of the 74181, the first complete ALU on a single chip.^[8] (Versions of the 74181 include the 74F181, the 40181^{[citation needed]}, the 74AS181, the 72LS181, the 74HCT181, etc.). The 74181 is a 4-bit wide ALU can perform all the traditional add / subtract / decrement operations with or without carry, as well as AND / NAND, OR / NOR, XOR, and shift.

A typical home-brew CPU uses 4 of these 74181 chips to build an ALU that can handle 16 bits at once, like the Data General SuperNova.^[9] The simplest home-brew CPUs have only one ALU, which at different times is used to increment the program counter, do arithmetic on data, do logic operations on data, and calculate addresses from base+offset.

Some people who build TTL CPUs attempt to "save chips" by building that one ALU of less than the largest word size (which is often 16 bits in TTL computers). For example, the earliest Data General Nova computers used a single 74181 and processed all data 4 bits at a time.^[9] Unfortunately, this adds complexity elsewhere, and may actually increase the total number of chips needed.^[10]^[11]^[12]

The simplest 16-bit TTL ALU wires the carry-out of each 74181 chip to the carry-in of the next, creating a ripple-carry adder.

Historically, some version of the look ahead carry generator 74182 was used to speed up "add" and "subtract" to be about the same speed as the other ALU operations.

Historically, some people who built TTL CPUs put two or more independent ALU blocks in a single CPU -- a general-purpose ALU for data calculations, a PC incrementer, an index register incrementer/decrementer, a base+offset address adder, etc.

We discuss ripple-carry adders, look-ahead carry generators, and their effects on other parts of a CPU at Microprocessor Design/Add and Subtract Blocks.

alternatives to 74181

Some people find that '181 chips are becoming hard to find.

Quite a few people building "TTL CPUs" use GAL chips (which can be erased and reprogrammed). ^[13] A single GAL20V8 chip can replace a 74181 chip.^[14] Often another GAL chip can replace 2 or 3 other TTL chips.

Other people building "TTL CPUs" find it more magical to build a programmable machine entirely out of discrete non-programmable chips. Are there any reasonable alternatives to the '181 for building an ALU out of discrete chips? The Magic-1 uses 74F381s and a 74F382 ALUs;^[15] is there any variant of the '381 and '382 chips that are any easier to find than a '181? ... the 74HC283, 74HCT283, MC14008 chips only add; they don't do AND, NAND, etc. ...

Many commercial machines, such as the Data General Nova 4, used four AM2901 ALUs "in parallel" to build each 16 bit ALU. Alas, these are apparently even harder to find than the 74181.

One could build the entire CPU -- including the ALU -- out of sufficient quantities of the 74153 multiplexer.^[16]

One designer "built-from-scratch" a 4-bit ALU that does add, subtract, increment, decrement, "and", "or", "xor", etc. -- roughly equivalent to the 4-bit 74181 -- out of about 14 simple TTL chips: 2-input XOR, AND, OR gates.^[17]

Another designer has posted a 8-bit ALU design that has more functionality than two 74181 chips -- the 74181 can't shift right -- built from 14 complex TTL chips: two 74283 4-bit adders, some 4:1 mux, and some 2:1 mux.^[18]

The designers of the LM3000 CPU have proven that much of the 74181 is actually unnecessary. The 8 bit "ALU" in the LM3000 can't actually do any logical operations, only "add" and "subtract", built from two 74LS283 4-bit adders and a few other chips. Apparently those "logical" operations aren't really necessary.^[19]

The MC14500B Industrial Control Unit has even less functionality than the LM3000 CPU. It is arguable that the MC14500B has close to the minimum functionality to even be considered a "CPU".^[20]^[21] The MC14500B is perhaps the most famous "1-bit" CPU.^[22]^[23]^[24]^[25] ^[26]

All of the earliest computers and most of the early massive parallel processing machines used a serial ALU, making them "1-bit CPUs".^[27]

other parts

solderless breadboard approach

Solderless breadboards are perhaps the fastest way to build experimental prototypes that involve lots of changes.

For about a decade, every student taking the 6.004 class at MIT was part of a team -- each team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated circuits.^[28] These CPUs were built out of TTL chips plugged into several solderless breadboards connected with lots of 22 AWG (0.33 mm²) solid copper wires.^[29]

wire-wrap

Traditionally, minicomputers built from TTL chips were constructed with lots of wire-wrap sockets (with long square pins) plugged into perfboard and lots of wire-wrap wire, assembled with a "wire-wrap pencil" or "wire-wrap gun".

stripboard

More recently, some "retrocomputer" builders have been using standard sockets plugged into stripboard and lots of wire-wrap wire, assembled with solder and a soldering iron.^[30]

Tools

Logisim is a free logic simulator which permits digital circuits to be designed and simulated using a graphical user interface.

Design Tips

There are many ways to categorize CPUs. Each "way to categorize" represents a design question, and the various categories of that way represent various possible answers to that question that needs to be decided before the CPU implementation can be completed.

One way to categorize CPU that has a large impact on implementation is: "How many memory cycles will I hold one instruction before fetching the next instruction?"

0: load-instruction on every memory cycle (Harvard architecture)
1: At most 1 memory cycle between each load-instruction memory cycle ( load-store architecture )
more: some instructions have 2 or more memory cycles between load-instruction memory cycles (memory-memory architecture)

Another way to categorize CPUs is "Will my control lines be controlled by a flexible microprogramming, a fixed control store, or by hard-wired control decoder that directly decodes the instruction?"

The load-store and memory-memory architectures require a "instruction register" (IR). At the end of every instruction (and after coming out of reset), the next instruction is fetched from memory[PC] and stored into the instruction register, and from then on the information in the instruction register (directly or indirectly) controls everything that goes on in the CPU until the next instruction is stored in the instruction register.

For homebrew CPUs, the 2 most popular architectures are^{[citation needed]}:

direct-decode Harvard architecture
flexible microprogramming that supports the possibility of memory-memory architecture.

Another way to categorize CPUs is "How many sub-states are in a complete clock cycle?"

Many textbooks imply that a CPU has only one clock signal -- a bunch of D flip-flops each hold 1 bit of the current state of the CPU, and those flip-flops drive that state out their "Q" output. Those flip-flops always hold their internal state constant, except at the instant of the rising edge of the one and only clock, where each flip-flop briefly "glances" at their "D" input and latches the new bit, and shortly afterwards (when the new bit is different from the old bit) changes the "Q" output to the new bit.

Single clock signals are nice in theory. Alas, in practice we can never get the clock signal to every flip-flop precisely simultaneously -- there is always some clock skew (differences in propagation delay). One way to avoid these timing issues is with a series of different clock signals.^[31] Another way is to use enough power^[32] and carefully design a w: clock distribution network (perhaps in the form of an w: H tree) with w: timing analysis to reduce the clock skew to negligible amounts.

Relay computers are forced to use at least 2 different clock signals, because of the "contact bounce" problem.

Many chips have a single "clock input" pin, giving the illusion that they use a single clock signal -- but internally a "clock generator" circuit converts that single external clock to the multiple clock signals used by the chip.

Many historically and commercially important CPUs have many sub-states in a complete clock cycle, with two or more "non-overlapping clock signals". Most MOS ICs used dual clock signals (a two-phase clock) in the 1970s^[33]

shortcuts

Building a CPU from individual chips and wires takes a person a long time. So many people take various shortcuts to reduce the amount of stuff that needs to be connected, and the amount of wiring they need to do.

3-state bus rather than 2-state bus often requires fewer and shorter connections.
Rather than general-purpose registers that can be used (at different times) to drive the data bus (during STORE) or the address bus (during indexed LOAD), sometimes it requires less hardware to have separate address registers and data registers and other special-purpose registers.
If the software guy insists on general-purpose registers that can be used (at different times) to drive the data bus (during STORE) or the address bus (during indexed LOAD), it may require less hardware to emulate them: have all programmer-visible registers drive only one internal microarchitectural bus, and (at different times) load the microarchitectural registers MAR and MDR from that internal bus, and later drive the external address bus from MAR and the external data bus from MDR. This sacrifices a little speed and requires more microcode to make it easier to build.
Rather than 32-bit or 64-bit address and data registers, it usually requires less hardware to have 8-bit data registers (occasionally combining 2 of them to get a 16-bit address register).
If the software guy insists on 16-bit or 32-bit or 64-bit data registers and ALU operations, it may require less hardware to emulate them: use multiple narrow micro-architectural registers to store each programmer-visible register, and feed 1 or 4 or 8 or 16 bits at a time through a narrow bus to the ALU to get the partial result each cycle, or to sub-sections of the wide MAR or MDR. This sacrifices a little speed (and adds complexity elsewhere) to make the bus easier to build. (See: 68000, as mentioned above)
Rather than many registers, it usually requires less hardware to have fewer registers.
If the software guy insists on many registers, it may require less hardware to emulate some of them (like some proposed MMIX implementations) or perhaps all of them (like some PDP computers): use reserved locations in RAM to store most or all programmer-visible registers, and load them as needed. This sacrifices speed to make the CPU easier to build. Alas, it seems impossible to eliminate all registers -- even if you put all programmer-visible registers in RAM, it seems that you still need a few micro-architectural registers: IR (instruction register), MAR (memory address register), MDR (memory data register), and ... what else?
Harvard architecture usually requires less hardware than Princeton architecture. This is one of the few ways to make the CPU simpler to build *and* go faster.

Harvard architecture

The simplest kinds of CPU control logic use the Harvard architecture, rather than Princeton architecture. However, Harvard architecture requires 2 separate storage units -- the program memory and the data memory. Some Harvard architecture machines, such as "Mark's TTL microprocessor", don't even have an instruction register -- in those machines, the address in the program counter is always applied to the program memory, and the data coming out of the program memory directly controls everything that goes on in the CPU until the program counter changes. Alas, Harvard architecture makes storing new programs into the program memory a bit tricky.

microcode architecture

See Microprocessor Design/Microcodes.

Assembly Tips

...

"I don't recommend that anybody but total crazies wirewrap their own machines out of loose chips anymore, although it was a common enough thing to do in the mid- to late Seventies". -- Jeff Duntemann

Programming Tips

...