|
Performance Analysis Tools |
|
|---|---|
The most important goal of performance tuning is to reduce a program's wall clock execution time. However, reducing resource usage in other areas, such as memory or disk requirements, may also be a tuning goal. The following performance analysis tools are available on Kraken.
PAPI
The Performance API (PAPI) project allows users to monitor events that can be used to map code to underlying architecture. This correlation has a variety of uses in performance analysis including hand tuning, compiler optimization, debugging, benchmarking, monitoring and performance modeling.
Cray PAT
Cray PAT is the Cray performance analysis tool for instrumenting and tracing code. It may be used to selectively trace specific functions.
Apprentice2
Cray Apprentice2 is a post-processing performance data visualization tool. Use Cray Apprentice2 with Cray PAT to explore the experiment data and generate a variety of interactive graphical reports. It includes an online help system which is accessible whenever Cray Apprentice2 is running.
FPMPI
FPMPI and FPMPI_papi are light-weight profiling libraries that use the pmpi hooks, as specified in the MPI standard. Applications linked against one of these libraries will gather statistics about MPI use.If using FPMPI_papi, PAPI counter data will also be collected.
TAU
Tuning and Analysis Utilities (TAU) is a performance analysis tool available from the University of Oregon. There are a number of procedures for instrumenting, tracing, and profiling code.
SCALASCA
Scalasca supports an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. It is able to identify wait states that occur as a result of unevenly distributed workloads, such as scaling communication-intensive applications to large processor counts.
MMPI
Memory Monitor (Memory Monitor Programming Interface) is a programming API for monitoring the memory usage of a process. Currently MMPI reports VmSize (the virtual memory size allocated for a process) and VmRSS (the real memory usage by a process). The library is implemented in C language, and it can be used in C/C++ and FORTRAN languages. This page provides examples to demonstrate how to use the library in different programming language environments, as well as with multi-threading and MPI.
Technical Papers and Presentations
The following documents provide some additional information on optimization techniques and performance analysis.
Software Optimization Guide for AMD64 Processors
amd.com
Analysis
of
Parallel
Program
Performance
Using
CrayPat
Jeff Larkin
XT Tuning Top 10
Jeff Larkin
Porting, Scaling, and Optimization on Cray XT Systems
Jeff Larkin
Analysis of Parallel Program Performance Using CrayPat, fMPI, and IPM
Haihang You
Using CrayPAT to measure application performance
Luiz DeRose
Specific Optimizations for the Cray Xt5 6-Core Systems
Jeff Larkin