The National Institute for Computational Sciences

General: What should I do in the event of a lustre slowdown?

In the event of a lustre slowdown, there are many things to consider as lustre has many working parts and is shared by all users on the system. NICS continually monitors lustre's performance and seeks to improve researcher's data communications. If you notice your code's I/O performance or the lustre filesystem is slower than usual, please answer the following questions to the best of your knowledge and email XSEDE Help Desk your answers.

  • When did you first notice the slowdown? How long did it last?
  • Which login node were you on?
  • Can you estimate the magnitude of the slowdown? (ex - "It took 2 min instead of 3 secs", "batch job exceeded walltime limit of 10 hours, but normally finishes in 8 hours")
  • What were you doing? Interactive command (like "ls")? Batch job?
  • For interactive commands:
    • Which host were you using?
    • Did you see the same behavior on other hosts?
    • Can you provide the exact command that was run and the directory in which it was run?
  • For batch jobs:
    • Can you supply the job IDs for jobs that were affected?
    • Can you provide any details about the IO pattern for your job?