Two weeks ago, I ran the Intel-native GC4 on an 8-core machine. It revved all 8 processing cores yet ran no faster than on a four-year old single-core G4 laptop at one-third the clock speed. A few minutes with Shark identified the culprit: a single innocuous call to GetWRefcon. In the single-user, single-core, single-threaded Macintosh toolbox of 1984, that call would have taken a few instructions. Mac OS X must share system resources across multiple users, processor cores, applications, and threads of execution all making demands at once. GetWRefCon is labelled Not Thread Safe: it is not guaranteed to work at all from anywhere but the main thread. Rather than crash or return a bogus result, it was causing all eight otherwise independent threads of execution to serialize on that one shared resource, passing the baton from one thread to the other, unable to execute in parallel. Fixing this was easy. Although the window structure is a shared resource protected by a lock, the RefCon itself that I actually needed contained read-only information all threads could read in parallel safely without locking. No need to go through GetWRefCon at all. The parallel threads should have been passed a pointer to the read-only document record directly.
After fixing that, some tests ran twenty times faster. Other tests however, still ran at the same speed as on the older, slower, single-core laptop. I devoted the last two weeks to figuring out why.
Continue reading "Adventures in Optimization: Prelude" »