• National Institute for Computational Sciences is a UT/ORNL Partnership

Optimization Tools

  Performance Analysis Tools

The most important goal of performance tuning is to reduce a program's wall clock execution time. However, reducing resource usage in other areas, such as memory or disk requirements, may also be a tuning goal. The following performance analysis tools are available on Kraken.

PAPI

The Performance API (PAPI) project allows users to monitor events that can be used to map code to underlying architecture. This correlation has a variety of uses in performance analysis including hand tuning, compiler optimization, debugging, benchmarking, monitoring and performance modeling.

Cray PAT

Cray PAT is the Cray performance analysis tool for instrumenting and tracing code. It may be used to selectively trace specific functions.

Apprentice2

Cray Apprentice2 is a post-processing performance data visualization tool. Use Cray Apprentice2 with Cray PAT to explore the experiment data and generate a variety of interactive graphical reports. It includes an online help system which is accessible whenever Cray Apprentice2 is running.

FPMPI

FPMPI and FPMPI_papi are light-weight profiling libraries that use the pmpi hooks, as specified in the MPI standard. Applications linked against one of these libraries will gather statistics about MPI use.If using FPMPI_papi, PAPI counter data will also be collected.

TAU

Tuning and Analysis Utilities (TAU) is a performance analysis tool available from the University of Oregon. There are a number of procedures for instrumenting, tracing, and profiling code.

SCALASCA

Scalasca supports an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. It is able to identify wait states that occur as a result of unevenly distributed workloads, such as scaling communication-intensive applications to large processor counts.

MMPI

Memory Monitor (Memory Monitor Programming Interface) is a programming API for monitoring the memory usage of a process. Currently MMPI reports VmSize (the virtual memory size allocated for a process) and VmRSS (the real memory usage by a process). The library is implemented in C language, and it can be used in C/C++ and FORTRAN languages. This page provides examples to demonstrate how to use the library in different programming language environments, as well as with multi-threading and MPI.

Technical Papers and Presentations

The following documents provide some additional information on optimization techniques and performance analysis.

Software Optimization Guide for AMD64 Processors
amd.com

Analysis
 of 
Parallel 
Program
 Performance 
Using
 CrayPat
Jeff Larkin

XT Tuning Top 10
Jeff Larkin

Porting, Scaling, and Optimization on Cray XT Systems
Jeff Larkin

Analysis of Parallel Program Performance Using CrayPat, fMPI, and IPM
Haihang You

Using CrayPAT to measure application performance
Luiz DeRose

Specific Optimizations for the Cray Xt5 6-Core Systems
Jeff Larkin