Back in the day, studying physics, the topics of our lab classes and problem sets roughly followed the history of physics from Galileo to Newton to Einstein, from Faraday to Maxwell, from classical to quantum mechanics. I remember being quite excited when we started redoing experiments from the twentieth century.
While programming doesn't have the same depth of history, nearly everything I do day to day is straightforward using techniques fully understood decades ago. How to implement parallelism in C++, is still actively studied. C++ inherits from C the notion of an abstract machine. C++ compilers are allowed by the standard to generate any code which produces the same visible effects as that of the abstract C++ machine. Compilers go to enormous effort to optimize their code generation and will reorder instructions and eliminate code in ways that make the code run faster without changing the results. The compilers, however, (following the standard), only consider visible effects from the perspective of a single thread of execution. (This is necessary. Were compilers forced to consider multiple threads of execution too many powerful optimization would be disallowed.) The problem with C++ in this context is that the C++ abstract machine considers only single threaded execution and the language itself has no way to express the synchronization constraints needed for parallel programming.
While making Graphing Calculator thread-safe, I examined all the classes to determined which could be shared by multiple threads. Equations, for example, after being constructed in the main thread, are conceptually static, and can be read by multiple readers in threads started after the equation is fully constructed. However, while they are conceptually static, they contain one field which is lazy-evaluated introducing a race condition if multiple threads try to write to that structure simultaneously. The original code looks like this:
if (not fAlreadyEvaluated) {
fValue = ComputeValue();
fAlreadyEvaluated = true;
}
return fValue;
This could easily be made thread-safe by protecting all access with a mutex lock. However, nearly all accesses are read-only. Only the first access (by any thread) needs to write the value. Locking a mutex for every read would make the reads much more expensive. One could optimize for the common case by using a lock only to make the lazy evaluation a critical section to insure that only one thread ever writes the value. That leads to code like this:
if (not fAlreadyEvaluated) {
Lock(fMutex);
if (not fAlreadyEvaluated) {
fValue = ComputeValue();
fAlreadyEvaluated = true;
}
Unlock(fMutex);
}
return fValue;
Note the second text of fAlreadyEvaluated after acquiring the mutex. This is necessary as another thread may have done the work while this thread of execution was waiting to acquire the lock. This is known as the Double Checked Locking Pattern. Schmidt and Harrison dissect this pattern extensively in this excellent paper of 1997.
Unfortunately, the pattern is flawed, and cannot be implemented correctly and portably in C++ at all. While Schmidt and Harrison rightly consider the problems of compiler optimizations, instructions reordering, and cache consistency, they do not consider the possiblity of CPU read and write reordering allowed by relaxed consistency of current CPUs. Meyers and Alexandrescu address those problems in C++ and The Perils of Double-Checked Locking and Double-Checked Locking, Threads, Compiler Optimizations, and More (2004).
Andrei's Summary: Multithreading is just one damn thing after, before, or simultaneous with another.
While there is no portable way to implement DCLP in C++, Meyers and Alexandrescu do show how it can be used correctly if your implementation provides a mechanism for memory barriers respected by both the compiler and the CPU. The code in Graphing Calculator now looks something like:
if (!fAlreadyEvaluated) {
OurLock lock(fMutex);
if (!fAlreadyEvaluated) {
fValue = ComputeValue();
OurWriteBarrier();
fAlreadyEvaluated = true;
}
}
return fValue;
I'm unsure if I need an additional read barrier where Meyers and Alexandrescu show an acquire barrier in their. Anyone?
| Permalink
|
Comments (1)
|
TrackBacks (0)