Wednesday, July 30, 2008

The Future of Computing - Super or Otherwise

This week I am in the very beautiful and uncomfortably hot and humid capitol of the Lonestar state - Austin, Texas. Austin is home to the Texas Advanced Computing Center (TACC), the newest big player in the academic supercomputing world. TACC got the first "track 2" machine in the NSF's push for a petascale supercomputer. For more on the track 2 machine, known as Ranger (shown on the right), see one of my other posts here.

The essential problem is that modern supercomputers are moving into the realm of multicore processors. Ranger uses quad-core processors with 16 cores and one network connection per node. Ranger has a total of 3,840 nodes, giving it effectively 61,440 processors. For codes like ASH (used for simulating the interior of the sun) that require global communication (every processor needs to talk with every other processor at every time step), the cost of standard global communication scales as the number of processors squared when you use MPI. MPI is the current standard for communication between processors and it works by each processor sending and receiving messages from the other processors. MPI works just fine with 256 or 512 processors and there are some systems (such as the BigBen system as the Pittsburgh Supercomputing Center) where our code can effectively use up to 2048 processors. However, that N^2 scaling really starts to wipe us out at the high processors counts on systems like Ranger.

Here at TACC, I'm leaning about one possible solution to this problem - OpenMP. MPI treats every core on a system like Ranger as an independent processor with it's own independent memory. However, if you think about how your dual-core desktop system works, it's not two processors with their own memory. Instead, both of your cores share the same memory. If your cores are careful and play nicely with each other, they can share the memory without chopping it up into two pieces. For example, core 1 can load an array into memory, core 2 can add 3 to each entry, and then core 1 can divide each element of the array by the next element. This is known as a shared memory paradigm and the standard for shared memory programming in known as OpenMP. And yes Joe, the "Open" in the name does stand for open source.

On a system like Ranger, each node is a shared memory system, so instead of passing messages between processors on a node, the processors simply have access to the memory of the other processors. There is still a cost for this communication, but it is far, far less than the cost of bouncing messages around the network. The real benefit of using OpenMP, however, is that instead of our communication cost scaling as N^2, using shared memory on each node would cause (N/16)^2 scaling. That means that when we were limited to 512 processors before, we could now theoretically use 131,072 processors for the same communication cost.

The benefits of OpenMP, however, go well beyond the scientific supercomputing community. Since most desktops and laptops are now multicore shared memory systems, OpenMP is becoming the standard in consumer multi-core programming as well. So as core counts expand (Intel now has a working 256 core prototype), OpenMP will probably be the key to being able to effectively use all of those cores.

1 comment:

  1. 1. "And yes Joe, the "Open" in the name does stand for open source."

    I'm telling you guys, open source gives you the oppurtunity to be very productive.

    The biggest hurdle facing open soure is how to keep the code open, but keep money conming in.

    2. I am excited to watch how supercomputers evolve in the future. Surly the future of research will more and more involve supercomputers.


To add a link to text:
<a href="URL">Text</a>