Adventures in Optimization: Benchmarking
My first job at Apple in 1992 was benchmarking the quality of a dozen different handwriting recognition software libraries from different companies. We like to pretend that everything from health to cars to wars can be measured by a single number but real systems are multi-dimensional. Benchmarks are like standards: everyone likes to make their own. I approach software measurements with my old Stanford Physics lab training. I treat the software as tangible physical system, experiment to determine what I can measure reproducibly, look for mathematical relationships between variables, and attempt to construct a useful basis set of axes upon which to characterize their phase space. This did not make me any friend while benchmarking for Apple. Each company or internal group with a solution to sell had strong ideas about comparing systems to put their own solution in the best light. Furthermore, handwriting recognition was and is still an unsolved research problem. Comparing different research projects is quite different from comparing products of a mature technology. Ultimately, none of the choices proved adequate.
For Graphing Calculator, I use performance measurements to make programming choices when comparing high-level algorithms, low-level implementation details, different compilers, and compilation options. Holding the Command key while pressing the slider Play button will initiate a stopwatch to time the slider. When I'm focusing on a particular routine, this is useful to compare small code changes from one build to the next. In the Graphing Calculator menu, the Graphing Calculator > Benchmark command runs through thirty-two tests timing calculation and drawing. These help me compare how GC behaves on different hardware and how its behavior changes across versions years apart. The report looks like this: