A Programmer's Apology: October 2006 Archives

Software and the Concurrency Revolution

Herb Sutter gave a talk (wmv video, pdf slides) here in Berkeley at Intel Research last month. Related publications:

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software - Dr. Dobb's, March 2005
Software and the Concurrency Revolution - ACM Queue, September 2005

Posted by Ron Avitzur at 10:43 AM | Permalink | Comments (0) | TrackBacks (0)

Audience Participation Friday

What other blogs do y'all read?

Posted by Ron Avitzur at 11:00 AM | Permalink | Comments (2) | TrackBacks (0)

The Trouble with Physics

Just finished reading Lee Smolin's The Trouble with Physics: The Rise of String Theory, the Fall of a Science, and What Comes Next. Nearly every blogger I read has already reviewed this book, and I have nothing to add to their discussion of the physics: Bee, Chad, Christine, David, Sean, Peter, xkcd.

As a longtime reader of Not Even Wrong, I was already familiar with the history of string theory and it's failure to live up to its initial promise. Many reviewers compare The Trouble with Physics to Peter Woit's recent book Not Even Wrong which grew out of his blog criticizing the over-hyping of string theory. Smolin's book seems to me a perfect companion to Ian Stewart's Letters to a Young Mathematician. Both books provide an insider's peek behind the curtains of academia demystifying the day to day work and politics of professional physicists and mathematicians going about their career.

The elements that fascinated me in both books were the personal anecdotes, the living history where the authors step back from painting a big historic picture and tell stories of them and their colleagues. Here is my favorite story, where Smolin tells of meeting his mentor, Paul Feyerabend.

To thank him for saving my career, I sent a copy of my PhD thesis to Feyerabend. In reply, he sent me his new book, Science in a Free Society, with a note inviting me to look him up if I was ever in Berkeley. A few months later, I happened to be in California for a particle-physics conference and tried to track him down, but it was quite a project. He kept no office hours at the university and, indeed, no office. The Philosphy Department secretary laughed when I asked for him and advised me to try him at home. There he was in the phone book, on Miller Avenue in the Berkeley hills. I summoned up my courage, dialed, and politely asked for Professor Paul Feyerabend. Whoever was on the other end shouted "Professor Paul Feyerabend! That's the other Paul Feyerabend. You can find him at the university" and hung up. So I dropped in on one of his classes, and found him happy to talk afterward, if only briefly. But in the few minutes he gave me, he offered an invaluable piece of advice. "Yes, the academic world is screwed up, and there's nothing you can do about it. But don't worry about that. Just do what you want. If you know what you want to do and advocate for it, no one will put any energy into stopping you." ....

"Just do what you want to do and don't pay any attention to anything else. Never in my career have I spent five minutes doing something I didn't want to be doing."

Posted by Ron Avitzur at 11:56 AM | Permalink | Comments (0) | TrackBacks (0)

Volume of Integration

Chris Young posted this document to the Graphing Calculator users group illustrating integrating the volume of a surface of rotation.

Posted by Ron Avitzur at 08:37 AM | Permalink | Comments (0) | TrackBacks (0)

C++0x

Colleagues are now meeting in Portland to discuss the future of the C++ standard. Later this week also in Portland, Library-Centric Software Design '06 at OOPSLA '06. A number of C++0x proposals address the problem of threading, allowing the programmer to specify atomicity and ordering constraints within the language and providing high-level functionality for managing synchronization. In particular:

But wait, there's more:

Posted by Ron Avitzur at 12:17 PM | Permalink

Dependencies

Consider a dipole field as a gravity wave passes distorting space. On the left are positive and negative charges drawn as spheres, a vector field drawn as arrows, and field lines drawn as tubes. The right shows the objects under a coordinate transformation of a compression wave travelling along the x-axis.

Here is an animated movie.

Now, consider each equation.

First, each is constructed as a separate object which can is evaluated in parallel threads. However, the transformed equations refer to the solutions of the untransformed equations, so they can't begin calculating until the first are done. The spheres are given as implicit equations which as solved by evaluating the function on a cubic lattice then root-finding along the edge of each cube. This is computationally expensive, but also embarrassingly parallelizable and subdivided into as many tasks as the computer has CPU cores. Each field lines is computed by an ODE solver. The ODE intial conditions are specified here as points distributed about a surface described by an implicit equation. As that implicit equation is solved (which itself could be done in parallel), each field line is an independent calculation submitted for threaded parallel execution. The transformed tubes and surfaces on the right are also done in parallel. The details, as with all programming, are ensuring that each calculation does not begin until the results it is dependent upon are complete, and ensuring that calculations running in parallel are truly independent of one other.

I spent last week parallelizing the code which implements this, reviewing it for race conditions arising from dependencies between components. While conceptually straightforward, the actual programming required delicate care.

How many race conditions do I see? Let me count them, one, tthwreoe.

Posted by Ron Avitzur at 08:45 PM | Permalink | Comments (2)

Cooperative Multitasking

I've been deep in the bit mines all week, wired on Mountain Dew, reviewing the 35,000 lines of code which implement the actual graphing classes of Graphing Calculator, making the Compute method of each different graphing algorithm thread-safe and able to run preemptively. It's been fun to revisit a decade worth of code evolution and simplify it with the benefit of hindsight. Much of the complication was due to the necessity of preserving state across multiple calls to the Compute method of each class so that it could be time-sliced to keep the UI responsive in a single-threaded environment.

As I sat down to describe this, I remembered that I had already wrote what I wanted to say four years ago to a colleague who asked: I've always wanted to know how to do threading. Under OS X that will become more and more valuable as multi-processor machines become increasingly common. Where is there a good discussion of using threads?

At 3:05 PM -0800 10/25/02, Ron Avitzur wrote:

At the time GC 1.0 was written there were four or five different incompatible threads packages available for the Mac, all with fundamental design problems of different natures. None of them had yet been ported to or QA'd against PowerPC so the risks introduced by relying on them were too great and we did it the hard way.

All the compute loops in GC that you might expect to look like this:
for (i = 0; i < dataWidth; i++)
	for (j = 0; j < dataHeight; j++)
		...
actually look like this:
while (OurTickCount() < endTime) {
	lastV++;
	if (lastV >= dataHeight) {
		lastV = 0;
		lastU++;
		}
	if (lastU >= dataWidth)
		break;

	int i = lastU;
	int j = lastV;

	...
	}
where lastV and lastU are class variables that persists across calls to graph->Compute(endTime) and endTime tells Compute how long it has before we want it to return, whether it is done or not, and endTime typically is less than 10 ticks, 1/6 seconds before the event loop checks for keyboard, mouse, and update events. The actually coding is a bit more complex to allow everything to happen in the same application thread.

Modern OS architectures would allow each computation to spawn a separate thread to which the OS gives time slices, but I can't advise you on that as I've never studied it. Threads introduce a different sort of complexity, in that from a debugging standpoint they are running concurrently, and on a machine with multiple CPUs they may actually be running concurrently. Then you have to worry about locking or using semaphores around any access to shared data and deadlock or race conditions when there is contention over a shared resource. This is all an active area of research in the C++ community and one of the more common sources of difficult bugs.

I could give more useful advice about threading APIs these days. In another month or two I might even have useful experience in re-architecting old code for thread-safety.

Posted by Ron Avitzur at 01:09 PM | Permalink | Comments (0) | TrackBacks (0)

Osculating Circles

Melkana Brakalova-Trevithick sent me these equations showing the surface traced out by the osculating circle moving along a helix.

Posted by Ron Avitzur at 03:55 PM | Permalink | Comments (0) | TrackBacks (0)

Embarrassingly parallelizable

After a month of reading about threading and a week of coding infrastructure for parallel execution, yesterday I parallelized the Graphing Calculator code for 2D inequalities. This morning I tried it on a dual-core machine while visiting a colleague and was gratified to see visual confirmation of the parallelism as updates from a slow computation marched across the screen in two parallel bands.

Graphing Calculator spends most of its time doing calculations affectionately refered to as embarrassingly parallel. Evaluating 2D inequalities, for example, might be thought of in terms of evaluating a function separately at every point on the screen. Divide the screen geometry into pieces for each thread and you're done. It's not quite that simple, though.

You could divided the screen into even and odd pixels, for example. That might make sense in terms of load-balancing, since it might be easier to do computations in different places. But, though each pixel is conceptually independent, adjacent pixels are in nearby memory and share cache lines resulting in memory contention, cache-line ping-pong, and much slower parallel execution from the communication overhead of maintaining cache coherency. Similarly, dividing the screen into vertical bands may result in unintended cache-line sharing depending on the widths of the bands and the width of the cache lines. Dividing the screen along horizontal scan lines limits the cache-line sharing to where the end of one block and the beginning of the next block. This is a common issue in designing for performance. Seemingly symmetric choices (like the order of nested loops) can affect performance orders of magnitude due to undlerlying low-level details of a machine's memory cache hierarchy.

Graphing Calculator uses interval arithmetic to speed up drawing inequalities so that it doesn't always have to evaluate the function at every point on screen. GC starts by evaluating the interval extension of the function once over a rectangular interval spanning the domain visible. The interval evaluation can act as a proof that the function is true or false everywhere on that interval. Otherwise, GC subdivides the rectangle into two pieces and recurses. For simple functions, large areas of the screen are evaluated and drawn quickly. This introduces a challenge for load balancing given the simple decomposition into horizontal strips. The top half of the screen evaluated in one thread may take only a single interval function evaluation and complete quickly, while the bottom half of the screen may take much much longer to computer. One could attempt to share the work between threads at the level of the recursive subdivision step in this divide-and-conquer algorithm. I suspect that memory sharing or lock contention for the list of rectangles to-be-computed could become an issue, but I would have to experiment and measure to be sure.

Posted by Ron Avitzur at 01:47 PM | Permalink | Comments (0) | TrackBacks (0)

Double Checked Locking Pattern

Back in the day, studying physics, the topics of our lab classes and problem sets roughly followed the history of physics from Galileo to Newton to Einstein, from Faraday to Maxwell, from classical to quantum mechanics. I remember being quite excited when we started redoing experiments from the twentieth century.

While programming doesn't have the same depth of history, nearly everything I do day to day is straightforward using techniques fully understood decades ago. How to implement parallelism in C++, is still actively studied. C++ inherits from C the notion of an abstract machine. C++ compilers are allowed by the standard to generate any code which produces the same visible effects as that of the abstract C++ machine. Compilers go to enormous effort to optimize their code generation and will reorder instructions and eliminate code in ways that make the code run faster without changing the results. The compilers, however, (following the standard), only consider visible effects from the perspective of a single thread of execution. (This is necessary. Were compilers forced to consider multiple threads of execution too many powerful optimization would be disallowed.) The problem with C++ in this context is that the C++ abstract machine considers only single threaded execution and the language itself has no way to express the synchronization constraints needed for parallel programming.

While making Graphing Calculator thread-safe, I examined all the classes to determined which could be shared by multiple threads. Equations, for example, after being constructed in the main thread, are conceptually static, and can be read by multiple readers in threads started after the equation is fully constructed. However, while they are conceptually static, they contain one field which is lazy-evaluated introducing a race condition if multiple threads try to write to that structure simultaneously. The original code looks like this:

if (not fAlreadyEvaluated) {
   fValue = ComputeValue();
   fAlreadyEvaluated = true;
   }
return fValue;

This could easily be made thread-safe by protecting all access with a mutex lock. However, nearly all accesses are read-only. Only the first access (by any thread) needs to write the value. Locking a mutex for every read would make the reads much more expensive. One could optimize for the common case by using a lock only to make the lazy evaluation a critical section to insure that only one thread ever writes the value. That leads to code like this:

if (not fAlreadyEvaluated) {
   Lock(fMutex); 
   if (not fAlreadyEvaluated) { 
      fValue = ComputeValue();
      fAlreadyEvaluated = true;
      }
   Unlock(fMutex);
   }
return fValue;

Note the second text of fAlreadyEvaluated after acquiring the mutex. This is necessary as another thread may have done the work while this thread of execution was waiting to acquire the lock. This is known as the Double Checked Locking Pattern. Schmidt and Harrison dissect this pattern extensively in this excellent paper of 1997.

Unfortunately, the pattern is flawed, and cannot be implemented correctly and portably in C++ at all. While Schmidt and Harrison rightly consider the problems of compiler optimizations, instructions reordering, and cache consistency, they do not consider the possiblity of CPU read and write reordering allowed by relaxed consistency of current CPUs. Meyers and Alexandrescu address those problems in C++ and The Perils of Double-Checked Locking and Double-Checked Locking, Threads, Compiler Optimizations, and More (2004).

Andrei's Summary: Multithreading is just one damn thing after, before, or simultaneous with another.

While there is no portable way to implement DCLP in C++, Meyers and Alexandrescu do show how it can be used correctly if your implementation provides a mechanism for memory barriers respected by both the compiler and the CPU. The code in Graphing Calculator now looks something like:

if (!fAlreadyEvaluated) {
   OurLock lock(fMutex);
   if (!fAlreadyEvaluated) {
      fValue = ComputeValue();
      OurWriteBarrier();
      fAlreadyEvaluated = true;
      }
   }
return fValue;

I'm unsure if I need an additional read barrier where Meyers and Alexandrescu show an acquire barrier in their. Anyone? Posted by Ron Avitzur at 09:54 AM | Permalink | Comments (1) | TrackBacks (0)

Fragile Things

Neil Gaiman read us ghost stories around a campfile last night from Fragile Things. Well, it was at the Berkeley Rep, but with the lights down it felt like we were outside gathered round a fire, with a thousand of our closest friends.

Instructions, should you ever find yourself in a fairy tale....

if any creature tell you that it hungers
feed it.
If it tells you that it is dirty,
clean it.
If it cries to you that it hurts,
if you can,
ease its pain.

Neil Gaiman reads

The Day the Saucers Came

That day, the saucers landed. Hundreds of them, golden,
Silent, coming down from the sky like great snowflakes,
And the people of Earth stood and stared as they descended,
Waiting, dry-mouthed to find what waited inside for us

Walking out into the night air afterwards, the world seemed magical, perceptions forever changed.

Posted by Ron Avitzur at 11:46 AM | Permalink | Comments (0) | TrackBacks (0)

A Programmer's Apology

October 28, 2006

Software and the Concurrency Revolution

October 27, 2006

Audience Participation Friday

October 24, 2006

The Trouble with Physics

October 23, 2006

Volume of Integration

October 17, 2006

C++0x

October 14, 2006

Dependencies

October 11, 2006

Cooperative Multitasking

October 07, 2006

Osculating Circles

October 06, 2006

Embarrassingly parallelizable

October 04, 2006

Double Checked Locking Pattern

October 03, 2006

Fragile Things