Performance Analysis Tools
The most important goal of performance tuning is to reduce a program's wall clock execution time. However, reducing resource usage in other areas, such as memory or disk requirements, may also be a tuning goal. The following performance analysis tools are available on Kraken.
The Performance API (PAPI) project allows users to monitor events that can be used to map code to underlying architecture. This correlation has a variety of uses in performance analysis including hand tuning, compiler optimization, debugging, benchmarking, monitoring and performance modeling.
Cray PAT is the Cray performance analysis tool for instrumenting and tracing code. It may be used to selectively trace specific functions.
Cray Apprentice2 is a post-processing performance data visualization tool. Use Cray Apprentice2 with Cray PAT to explore the experiment data and generate a variety of interactive graphical reports. It includes an online help system which is accessible whenever Cray Apprentice2 is running.
FPMPI and FPMPI_papi are light-weight profiling libraries that use the pmpi hooks, as specified in the MPI standard. Applications linked against one of these libraries will gather statistics about MPI use.If using FPMPI_papi, PAPI counter data will also be collected.
Tuning and Analysis Utilities (TAU) is a performance analysis tool available from the University of Oregon. There are a number of procedures for instrumenting, tracing, and profiling code.
Scalasca supports an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. It is able to identify wait states that occur as a result of unevenly distributed workloads, such as scaling communication-intensive applications to large processor counts.
Memory Monitor (Memory Monitor Programming Interface) is a programming API for monitoring the memory usage of a process. Currently MMPI reports VmSize (the virtual memory size allocated for a process) and VmRSS (the real memory usage by a process). The library is implemented in C language, and it can be used in C/C++ and FORTRAN languages. This page provides examples to demonstrate how to use the library in different programming language environments, as well as with multi-threading and MPI.
The following documents provide some additional information on optimization techniques and performance analysis.
XT Tuning Top 10
Using CrayPAT to measure application performance