Adventures in Optimization: Prelude
Two weeks ago, I ran the Intel-native GC4 on an 8-core machine. It revved all 8 processing cores yet ran no faster than on a four-year old single-core G4 laptop at one-third the clock speed. A few minutes with Shark identified the culprit: a single innocuous call to GetWRefcon. In the single-user, single-core, single-threaded Macintosh toolbox of 1984, that call would have taken a few instructions. Mac OS X must share system resources across multiple users, processor cores, applications, and threads of execution all making demands at once. GetWRefCon is labelled Not Thread Safe: it is not guaranteed to work at all from anywhere but the main thread. Rather than crash or return a bogus result, it was causing all eight otherwise independent threads of execution to serialize on that one shared resource, passing the baton from one thread to the other, unable to execute in parallel. Fixing this was easy. Although the window structure is a shared resource protected by a lock, the RefCon itself that I actually needed contained read-only information all threads could read in parallel safely without locking. No need to go through GetWRefCon at all. The parallel threads should have been passed a pointer to the read-only document record directly.
After fixing that, some tests ran twenty times faster. Other tests however, still ran at the same speed as on the older, slower, single-core laptop. I devoted the last two weeks to figuring out why.
Optimizing code is an investigative engineering interactive art form. Computer hardware, operating systems, and application software have become so unimaginably complex with myriad components interacting across multiple layers of abstraction that expectations, intuition and guesswork are most often misleading and distracting. Understanding what a program actually does while running requires instrumentation and tools.
Mac OS X provides a suite of visualization and analysis tools to help developers understand their software to get the best performance. I use these so frequently, they occupy a permanent presence on my Mac OS X Dock: Shark, OpenGL Profiler, OpenGL Driver Monitor, Quartz Debug, Thread Viewer, Activity Monitor, and Spin Control. These tools allow programmers to peek under the hood of their applications while they are running, each giving a different perspective on what is going on. Fitting all the sensory overload of data together, determining which parts are significant and interpreting it, however, is another problem altogether.