Harvard's first large-scale digital computer, which came to be known as the Mark I, was conceived by Howard H. Aiken (A.M. '37, Ph.D. '39) and built by IBM. Fifty-one feet long, it was installed in the basement of what is now Lyman Laboratory in 1944, and later moved to a new building called the Aiken Computation Laboratory, where a generation of computing pioneers were educated and where the Maxwell Dworkin building now stands as part of the mechanism remains on exhibit in the Science Center.
The Mark I performed additions and subtractions at a rate of about three per second; multiplication and division took considerably longer. This benchmark was soon surpassed by computers that could do thousands of arithmetic operations per second, then millions and billions. By the late 1990s a few machines were reaching a trillion (1012) operations per second; these were called terascale computers, as tera is the Système International prefix for 1012. The next landmark—and the current state of the art—is the petascale computer, capable of 1015 operations per second. In 2010, Kaxiras' blood flow simulation ran on a petascale computer called Blue Gene/P in Jülich, Germany, which at the time held fifth place on the Top 500 list of supercomputers.
The new goal is an exascale machine, performing at least 1018 operations per second. This is a number so immense it challenges the imagination. Stacks of pennies reaching to the moon are not much help in expressing its magnitude—there would be millions of them. If an exascale computer counted off the age of the universe in units of a billionth of a second, the task would take a little more than 10 seconds.
And what comes after exascale? We can look forward to zettascale (1021) and yottascale (1024); then we run out of prefixes. The engine driving these amazing gains in computer performance is the ability of manufacturers to continually shrink the dimensions of transistors and other microelectronic devices, thereby cramming more of them onto a single chip. (The number of transistors per chip is in the billions now.) Until about 10 years ago, making transistors smaller also made them faster, allowing a speedup in the master clock, the metronome-like signal that sets the tempo for all operations in a digital computer. Between 1980 and 2005, clock rates increased by a factor of 1,000, from a few megahertz to a few gigahertz. But the era of ever-increasing clock rates has ended.
The speed limit for modern computers is now set by power consumption. If all other factors are held constant, the electricity needed to run a processor chip goes up as the cube of the clock rate: doubling the speed brings an eightfold increase in power demand. SEAS Dean Cherry A. Murray, the John A. and Elizabeth S. Armstrong Professor of Engineering and Applied Sciences and Professor of Physics, points out that high-performance chips are already at or above the 100-watt level. "Go much beyond that," she says, "and they would melt."
If the chipmakers cannot build faster transistors, however, they can still make them smaller and thus squeeze more onto each chip. Since 2005 the main strategy for boosting performance has been to gang together multiple processor "cores" on each chip. The clock rate remains roughly constant, but the total number of operations per second increases if the separate cores can be put to work simultaneously on different parts of the same task. Large systems are assembled from vast numbers of these multicore processors.
When the Kaxiras group's blood flow study ran on the Blue Gene/P at Jülich, the machine had almost 300,000 cores. The world's largest and fastest computer, as of June 2014, is the Tianhe-2 in Guangzhou, China, with more than 3 million cores. An exascale machine may have hundreds of millions of cores, or possibly as many as a billion.