Hardware
On the occasion of the 50th anniversary of the hard disk drive I've been pondering the evolution of the hardware Graphing Calculator grew up on.
The original hard disk drive (IBM 305 RAMAC) in 1956 stored only 5MB of data on fifty 24-inch disks and weighed over 250 kg (over 550 pounds). About the size of two large refrigerators and as tall as a man, you could lease the whole unit for about $250,000/year in today's dollars - John Cole
I began coding Graphing Calculator's equation editing in 1985 for the original Macintosh with an 8 Mhz 68000 processor and 400K floppy disks. The challenge then was to make it responsive enough to keep up with typing equations. Since editing an equation changes its two-dimensional structure, the entire equation layout was redrawn with each keystroke. One trick was to first draw the equation with the selection so that where your eye was focusing was responsive. Peripheral vision didn't notice the lag elsewhere. Milo did not give the user any choice of equation font or size in order to simplify and speed up the update calculations.
We wrote Graphing Calculator 1.0 in 1993 for the original Power Mac 6100 with a 60 Mhz PowerPC 601. At that time, we wrote:In our tests, calculations using the PowerPC processor's single-precision floating-point multiply-add instruction were 20,000 times faster. This means that if we had started a lengthy floating-point calculation in 1984 at the release of the Macintosh, and that calculation were still being worked on by the computer, it would take a Power Macintosh starting now just four hours to catch up.
The equation editing code was the same written for Milo in the 1980's. On the Power Mac, updates during typing was no longer a problem, so we used that code unchanged and focused our attention on visualizing graphs and using the speed of the machine to improve the user interface. The lessons we learned then remain relevent today.
Taking full advantage of any machine requires understanding the hardware. On an 8 Mhz Motorola 68000, performance was limited by the CPU which took many 125-nanosecond cycles to complete each instruction. With no floatint point hardware, individual arithmetic operations took hundreds of instructions. On a 60 MHz PowerPC 601, many instructions took only one 16.7 nanosecond cycle. Furthermore, the PowerPC was one of the first superscalar machines, meaning that it had multiple execution units (integer, floating point, and branch) and could do multiple instructions per cycle. However, accessing memory was much slower, so the CPU kept an on-chip cache of recently accessed memory. In many cases, performance was entirely determined by memory usage patterns. Calculations which could fit their data entirely in the on-chip cache ran 100x faster than the same caculation on a larger data set which exhausted the cache and needed to access main memory. Programming for performance now meant choosing algorithms and data structures with locality of reference so they worked on data in pieces small enough to fit in the cache at one time.Today, the world is again different. Now machines have multiple cores, each with their own on-chip cache (perhaps shared with other cores on that chip, perhaps not). Programming for performance now means choosing algorithms that can run in parallel and minimize communication between parallel threads.