Active In SP
Joined: Feb 2011
17-02-2011, 04:08 PM
transputer.docx (Size: 119.09 KB / Downloads: 54)
The transputer was a pioneering microprocessor architecture of the 1980s, featuring integrated memory and serial communication links, intended for parallel computing. It was designed and produced by Inmos, a British semiconductor company based in Bristol.
For some time in the late 1980s many considered the transputer to be the next great design for the future of computing. While Inmos and the transputer did not ultimately live up to this expectation, the transputer architecture was highly influential in provoking new ideas in computer architecture, several of which have re-emerged in different forms in modern systems
In the early 1980s, conventional CPUs appeared to reach a performance limit. Up to that time, manufacturing difficulties limited the amount of circuitry designers could place on a chip. Continued improvements in the fabrication process, however, removed this restriction. Soon the problem became that the chips could hold more circuitry than the designers knew how to use. Traditional CISC designs were reaching a performance plateau, and it wasn't clear it could be overcome.
It seemed that the only way forward was to increase the use of parallelism, the use of several CPUs that would work together to solve several tasks at the same time. This depended on the machines in question being able to run several tasks at once, a process known as multitasking. This had generally been too difficult for previous CPU designs to handle, but more recent designs were able to accomplish it effectively. It was clear that in the future this would be a feature of all operating systems.
A side effect of most multitasking design is that it often also allows the processes to be run on physically different CPUs, in which case it is known as multiprocessing. A low-cost CPU built with multiprocessing in mind could allow the speed of a machine to be increased by adding more CPUs, potentially far more cheaply than by using a single faster CPU design.
The first transputer designs were due to David May and Robert Milne. In 1990, May received an Honorary DSc from University of Southampton, followed in 1991 by his election as a Fellow of The Royal Society and the award of the Patterson Medal of the Institute of Physics in 1992. Tony Fuge, a leading engineer at Inmos at the time, was awarded the Prince Philip Designers Prize in 1987 for his work on the T414 transputer.
The transputer (the name deriving from transistor and computer) was the first general purpose microprocessor designed specifically to be used in parallel computing systems. The goal was to produce a family of chips ranging in power and cost that could be wired together to form a complete parallel computer. The name was selected to indicate the role the individual transputers would play: numbers of them would be used as basic building blocks, just as transistors had earlier.
Originally the plan was to make the transputer cost only a few dollars per unit. Inmos saw them being used for practically everything, from operating as the main CPU for a computer to acting as a channel controller for disk drives in the same machine. Spare cycles on any of these transputers could be used for other tasks, greatly increasing the overall performance of the machines.
Even a single transputer would have all the circuitry needed to work by itself, a feature more commonly associated with microcontrollers. The intention was to allow transputers to be connected together as easily as possible, without the requirement for a complex bus (or motherboard). Power and a simple clock signal had to be supplied, but little else: RAM, a RAM controller, bus support and even an RTOS were all built in.
The original transputer used a very simple and rather unique architecture to achieve a high performance in a small area. It used microcode as the principal method of controlling the data path but unlike other designs of the time, many instructions took only a single cycle to execute. Instruction opcodes were used as the entry points to the microcode ROM and the outputs from the ROM were fed directly to the data path. For multi-cycle instructions, while the data path was performing the first cycle, the microcode decoded four possible options for the second cycle. The decision as to which of these options would actually be used could be made near the end of the first cycle. This allowed for very fast operation while keeping the architecture generic.
The clock speed of 20 MHz was quite high for the era and the designers were very concerned about the practicalities of distributing a clock signal of this speed on a board. A lower external clock of 5 MHz was used and this was multiplied up to the required internal frequency using a phase-locked loop (PLL). The internal clock actually had four non-overlapping phases and designers were free to use whichever combination of these they wanted so it could be argued that the transputer actually ran at 80 MHz. Dynamic logic was used in many parts of the design to reduce area and increase speed. Unfortunately, these techniques are difficult to combine with Automatic Test Pattern Generation scan testing so they fell out of favour for later designs.
The basic design of the transputer included serial links that allowed it to communicate with up to four other transputers, each at 5, 10 or 20 Mbit/s – which was very fast for the 1980s. Any number of transputers could be connected together over even longish links (tens of metres) to form a single computing "farm". A hypothetical desktop machine might have two of the "low end" transputers handling I/O tasks on some of their serial lines (hooked up to appropriate hardware) while they talked to one of their larger cousins acting as a CPU on another.
There were limits to the size of a system that could be built in this fashion. Since each transputer was linked to another in a fixed point-to-point layout, sending messages to a more distant transputer required the messages to be relayed by each chip on the line. This introduced a delay with every "hop" over a link, leading to long delays on large nets. To solve this problem Inmos also provided a zero-delay switch that connected up to 32 transputers (or switches) into even larger networks.
Transputers could be booted over the network links (as opposed to the memory as in most machines) so a single transputer could start up the entire network. There was a pin called BootFromROM that when asserted caused the transputer to start two bytes from the top of memory (sufficient for up to a 256 byte backward jump, usually out of ROM). When this pin was not asserted, the first byte that arrived down any link was the length of a bootstrap to be downloaded, which was placed in low memory and run. The 'special' lengths of 0 and 1 were reserved for PEEK and POKE – allowing inspection and changing of RAM in an unbooted transputer. After a peek (which required an address) or a poke (which took a word address, and a word of data – 16 or 32 bit depending on the basic word width of the transputer variant) the transputer would return to waiting for a bootstrap.
Supporting the links was additional circuitry that handled scheduling of the traffic over them. Processes waiting on communications would automatically pause while the networking circuitry finished its reads or writes. Other processes running on the transputer would then be given that processing time. It included two priority levels to improve real-time and multiprocessor operation. The same logical system was used to communicate between programs running on a single transputer, implemented as "virtual network links" in memory. So programs asking for any input or output automatically paused while the operation completed, a task that normally required the operating system to handle as the arbiter of hardware. Operating systems on the transputer did not have to handle scheduling: in fact, one could consider the chip itself to have an OS inside it.
To include all this functionality on a single chip, the transputer's core logic was simpler than most CPUs. While some have called it a RISC due to its rather spare nature (and because that was a desirable marketing buzzword at the time), it was heavily microcoded, had a limited register set, and complex memory-to-memory instructions, all of which place it firmly in the CISC camp. Unlike register-heavy load-store RISC CPUs, the transputer had only three data registers, which behaved as a stack. In addition a Workspace Pointer pointed to a conventional memory stack, easily accessible via the Load Local and Store Local instructions. This allowed for very fast context switching by simply changing the workspace pointer to the memory used by another process (a technique used in a number of contemporary designs, such as the TMS9900). The three register stack contents were not preserved past certain instructions, like Jump, when the transputer could do a context switch.
The transputer instruction set comprised 8-bit instructions divided into opcode and operand nibbles. The "upper" nibble contained the 16 possible primary instruction codes, making it one of the very few commercialized minimal instruction set computers. The "lower" nibble contained the single immediate constant operand, commonly used as an offset relative to the Workspace (memory stack) pointer. Two prefix instructions allowed construction of larger constants by prepending their lower nibbles to the operands of following instructions. Additional instructions were supported via the Operate (Opr) instruction code, which decoded the constant operand as an extended zero-operand opcode, providing for almost endless and easy instruction set expansion as newer implementations of the transputer were introduced.