Components of a computer (the CPU, memory, adapter cards) are coordinated by a “clock” signal measured in Megahertz (millions of ticks per second) or Gigahertz (billions of ticks per second). Generally we say that speeding up the clock makes the computer run faster, but that is slightly misleading. The clock tells all the components when they should all be done with their previous operation and when they should begin the next step. Components all run at whatever speed their design permits. If all the components can complete their longest operation with lots of time to spare, then there is room to speed up the clock, shorten the periods, and get more work done in the same amount of time. Set the clock too fast (“overclock”) and it ticks before one of the components is quite done with its last operation. Then the system crashes.
Clock Speed: Tell Me When it Hertz
Jargon explained: clock, megahertz/gigahertz, cycle
Computer performance is a traffic problem, moving data and instructions from memory and around inside the chip. Most people think of “traffic” in terms of cars and highways. However, there is a more relevant traffic analogy that everyone experienced before they learned to drive.
Students have been sitting in class for a long time. Finally the bell rings throughout the school signaling the end of the current period. Everyone gets up and moves through the hall to their next classroom. After a few minutes the bell rings again to signal the start of the next period. The bell has to ring everywhere in the school at the same time to coordinate movement. Without the bell, some classes would be released early and others would be released late.
The various parts of a computer hold instructions and data. Periodically they send this data along wires to the next processing station. To coordinate this activity, the computer provides a clock pulse. The clock is a regular pattern of alternating high and low voltages on a wire. To compare this with a clock in the hall, lets say the high voltage signal is a “tick” and the low voltage signal is a “tock”. The clock speed is measured in millions per second (Megahertz) or billions per second (Gigahertz). A 100Mh PC mainboard has a clock which “ticks” and “tocks” 100 million times each second. Each tick-tock sequence is called a cycle. The clock pulse tells some circuits when to start sending data on the wires, while it tells other circuits when the data from the previous pulse should have already arrived.
A small point of notation: The standard clock speeds are some multiple of 33.3333… MHz. Three times this speed is 100 MHz. By convention, the speeds are rounded down to 33 and 66 MHz, but the fraction explains why three times a 33 MHz clock is 100 and not 99.
There are five ways to increase the processing power of a CPU or the teaching power of a High School.
- Raise the clock speed - In the analogy, this corresponds to reducing the time available for each class period. If the teacher can talk faster, and if the students behave and listen more closely, this can work up to a point. Each student gets done with the school day earlier.
- Build a Pipeline - A more complicated solution shortens the class period, but then breaks each subject into a sequence of steps. If it takes 45 minutes to cover Algebra, and that time cannot be reduced, then the subject could be covered in three consecutive 15 minute periods. A simpler subject might be covered in just one period. After all, there is no reason other than the convenience of scheduling why every every class for every subject lasts the same period of time. Students get done quicker, but only if some of the subjects are light weight.
- Parallelism - Add more classrooms and more students. No one student learns anything faster, but at the end of the day the school has taught more people in the same amount of time. Of course, this only works if you have more students in the school district to teach.
- Class Size - double the number of students in each classroom. High Schools don’t like to do this. Computers, however, can easily switch from 32 to 64 bit operations. This will not effect most programs, but the particular applications that need processing power (games, multimedia) can be distributed in a 64 bit form to get more work done per operation.
- Build a Second School - Intel and AMD offer “multi-core” processor chips. This creates a system with two or four separate CPUs. An individual program won’t run any faster, and if these chips have a slower clock may even run more slowly. However, two programs will be able to run at once, and programs that require the most performance (games, multimedia) can be written to use both CPUs at once.
The easiest solution, and the one that benefits everyone without requiring any changes to software, is to speed up the clock. Beyond a point, that also required a longer pipeline. Then sometime in 2004 both CPU vendors ran into a ceiling. Intel had difficulty pushing its clock much beyond 3 GHz, and AMD had trouble pushing past 2 GHz. Because AMD had more parallelism, the AMD chip was just as powerful as the Intel chip despite the lower clock speed.
So both vendors reconsidered their strategy and have decided to consider the other options. AMD was first to offer a 64 bit processor, but because Microsoft was not ready to ship a corresponding 64 bit version of its operating system the AMD advantage has been limited. Intel developed a range of tricks to get more work done at lower clock speeds and lower power in their Centrino laptop processor, and they are now migrating some of this technology to desktop systems.
A mainboard for an Intel CPU generates a clock rate of 200, 266, 333, or 400 MHz. The board sets a default rate based on the type of CPU chip, but you can override this choice using BIOS configuration screens.
The Intel CPU transfers data four times per clock cycle. This rate (four times the actual clock speed) is often referred to as the Front Side Bus speed or FSB. 4x200=800, 4x266=1066, 4x333=1333, 4x400=1600.
Internally, the CPU chip applies a multiplier to the clock rate from the board. If the board delivers a 333MHz clock rate and the multiplier is 9, then the internal speed of the CPU is 9x333MHz = 3 GHz.
Each time the computer powers up, the mainboard senses the type of CPU that is installed. It can sense if the type of CPU has changed. Initially, the CPU clock speed will be set to whatever value is standard for this particular model of processor. Similarly, the mainboard determines the type of memory and sets speeds and timings to match the slowest type of memory installed in the system.
After the mainboard has been shipped to customers, the CPU vendor may add new processor models. Existing mainboards can be updated to correctly handle these new CPU chips by updating a set of programs called the BIOS. Unlike ordinary software, the BIOS is stored in read-only memory on the mainboard and it provides programming for the chipset instead of the CPU. However, if the new CPU chip fits in the same socket and uses the same voltage levels as the older processors, then an alternative to updating the BIOS is to manually enter all the right speeds and timings into the BIOS configuration screens displayed if you press Del or F2 just as the computer begins to power up.
More aggressive computer users may enter values that are faster than the numbers published for their CPU chip. This practice is called “overclocking.” Because processors are tested beyond their rated speed, almost any CPU can be overclocked by 5% or 10%. More than that may require special cooling.
All the ads and specifications quote clock speed in Megahertz. However, the more important number is the length of time between clock ticks (the cycle time). Such periods are usually measured in nanoseconds (billionths of a second) abbreviated “nsec.”
Electricity travels through a copper wire just a bit slower than the speed of light. Normally, we can just regard the speed of light as “very fast.” It becomes important when the distances are very long (astronomy) or when the times are very short (computers). A nanosecond is the amount of time that it takes light (or an electric signal) to travel about one foot.
A 200 MHz front side bus clock ticks every 5 nanoseconds. A 3 GHz internal CPU clock ticks every 1/3 of a nanosecond, or about the amount of time it takes light to travel 4 inches.
To add up a column of numbers with a pocket calculator, you simply type each number in and press the “+” key (or the “=” key at the end). Most users probably think that a PC spreadsheet program does the same thing. However, the human brain has actually been doing the hard part of the operation, moving down one row in the column, focusing on the number, and recognizing it. Each PC instruction carries with it a number of additional operations that would not be obvious to the casual user.
First, the computer must locate the next instruction in memory and move it to the CPU. This instruction is coded as a number. The computer must decode the number to determine the operation (say ADD), and the size of the data (say 16-bits). Additional information is then moved and decoded to determine the location in memory (the row and column of the spreadsheet). Finally, the number is added to the running total. Although a human might take some time to add two eight digit numbers together, the addition is the simplest part of the operation for a computer chip. Decoding the instruction and locating the data take the most time.
Each generation of Intel CPU chip has performed this operation in fewer clock cycles than the previous generation.
- A 386 CPU required a minimum of 6 clock ticks to add two numbers.
- A 486 CPU could generally add two numbers in two clock ticks.
- A Pentium CPU could add two numbers in a single clock tick.
- A modern processor can add two to six pairs of numbers in a single clock tick. If it discovers that the next instruction needs data that hasn’t arrived from slow memory, it can rearrange things to execute subsequent instructions until the data arrives.
After the Pentium, however, each new generation of CPU used ever more complex circuitry to make incrementally smaller improvements in performance. When programs were single threaded and ran on only one CPU, this effort was the best chip designers could do. However, for the last five years programmers have been aware of bulk instructions (doing one thing to two blocks of numbers instead of just a pair of numbers) and threading (running two parts of your program on two different processors at the same time). Some highly parallel or low power devices have taken several steps back and returned to the original first generation Pentium design when CPU design was simplest and efficiency was greatest.
Possible Future: Error Detection
Intel recently published a research paper that suggests a future change in CPU chip design that could lead to faster, better chips. Remember, the CPU clock has to run at a rate that allows each circuit to switch from 0 to 1 or from 1 to 0 before the next clock tick. The clock can be no faster than the slowest circuit. If the clock is too fast, a circuit could be somewhere between 0 and 1 when the period ends, and with an indeterminate value the operation now generates an error.
For fourty years computer memory on servers has had an error checking function called ECC. Intel knows how this works because in 1972, long before it ever considered building a microprocessor, it was building ECC memory for IBM mainframe computers. ECC adds extra bits to a block of memory, and sets those bits based on a mathematical/logical operation on the value of the other bits. Typically ECC can correct an error in any single bit and detect an error involving more than one bit.
Exactly how to do this in a CPU chip is the Intel research subject. The point, however, is that by adding additional circuits to each internal processing unit in the CPU chip, it should be possible to detect when one of the circuits failed to reach a proper state at the end of the last clock cycle. At that point the CPU chip may miss the next clock cycle reexecuting the failed operation. It doesn’t really matter that error recovery is disruptive, because if it happens infrequently it will have no measurable effect on overall processing power.
Because current CPU chips have no error detection capability, they have to be run with very conservative clock speeds. Intel calculates that if error detection and recovery were built into the CPU, the chips could run one third faster, or sever chips could consume one third less power and generate that much less heat. That is a very, very big gain in exchange for very infrequently missing a clock cycle or two in order to retry a failed operation.
Adding error recovery would be a big change to the design of CPU chips. Intel makes major design changes every other year. It is not clear when this change might become part of their architecture, but the idea is so reasonable and the result will be so powerful that it seems certain they will try hard to make it work.