Performance and correctness of software for large-scale heterogeneous systems Richard Vuduc Georgia Institute of Technology ------------------------------------------------------- Abstract: We give an overview of the major performance and correctness issues we faced while designing and implementing the first highly-scalable simulation of deformable red blood cells in plasma. The resulting code has scaled to 200k cores and achieved 0.7 Petaflop/s of sustained performance, and furthermore shown to work well on GPU-accelerated clusters; for these demonstrations, the software received the 2010 Gordon Bell Prize. However, the road there was paved with scalability missteps, numerical accuracy issues, and other software engineering challenges typically encountered in developing HPC codes on bleeding-edge machines. This talk summarizes these challenges and, with a little luck and suitable interpretation, suggests ways in which the EC^2 community might help to improve HPC software development productivity.