The Lustre file system is available as scratch space, available at /lustre/scratch/<user-name>. Lustre is a highly-scalable cluster file system. Storage of a given file is distributed (or, striped) across several hardware locations. This allows larger files than could be stored on any one location, also allowing for much faster transfer speeds if access to the file is parallelized.
Lustre is the only file system available to the compute nodes. Input, and output files must use this area, as well as the current directory at the time aprun is called. Executables, as well as file redirects to and from aprun may be on a home directory because aprun itself runs on a service node. If you receive an error such as no such file or directory, look for where your program is trying to access something on your home directory. Also, do not create files directly in /tmp. This is a small, memory-resident file system, and when /tmp fills up, system problems result.
Lustre File System Purge Policy
When Lustre is 70% full, users will be contacted by User Services and asked to clean up as much space as possible by whatever means necessary – moving to HPSS, deleting, etc. At 80% full, a list of files older than 90 days will be generated and those files will be deleted without prior notification. At 90% full, a list of the top ten Lustre file system users will be generated and these users will be put on a special list that puts all of their jobs in the batch system on hold indefinitely. The hold will be removed once the user has shown that they have cleaned up a sufficient amount of space.
Users can request an exception to this policy by making a request with a detailed justification to accounts@nics.utk.edu. Approved exceptions will place a user on the exceptions list for the 80% and 90% trigger thresholds.
Lustre Structure
It may be helpful to know the basic layout of Lustre to understand how to use it best or what issues may come up. This is the "bottom-up" view, realize that when accessing files, the system follows a "top-down" path.
Files are generally striped on several Object Storage Targets, or OSTs to enable truly parallel access to files, and to allow files larger than any one OST. An OST may be thought of as a "virtual disk", though it often consists of several physical disks, in a RAID configuration for instance.
Object Storage Servers, or OSS's, are servers which control access to a small set of OST's, and contain some metadata on the files stored on their OSTs. These are often the bottleneck on Kraken. Finally, on Kraken, Lustre consists of a single Meta Data Server, or MDS (other installations may have more than one). The MDS is the first place to go when accessing a file, but has only basic metadata: filename and location.
Lustre Use
Due to the superior I/O speeds, Lustre is the only space accessible from Compute Nodes, and is recommended when transferring large files to/from HPSS. However, remember that this is a scratch directory for temporary files: Lustre is not backed up or guaranteed. If you care about your data, archive it, or transfer it to another computer.
Our Lustre file system deals with files at a different scale from any monolithic file system, and has some limitations standard file systems lack, therefore it is best used somewhat differently than you would use a laptop or network file system:
- Most users alias
lsto return more information than the standard, for example, using different colors for different file types. This additional information requireslsto query the OSSs. Depending on Kraken's usage at that moment, and how the files in that directory are distributed among the OSSs, there is a good chance that one of the OSSs are busy, which causeslsto hang. Instead, you could use/bin/lsto circumvent the alias. This command only has to query the MDS, and generally returns very quickly. - Similarly, it is usually more efficient to use the Lustre tool,
lfs findrather than the GNUfindwhen searching for files on Lustre. - Several other GNU commands, such as
tarandrmare inefficient when operating on a large class of files on Lustre. For example, with millions of files,rm -rf *may take days, and have a considerable impact on Lustre for other users. A better way to do this is to generate a list of files to be removed ortar-ed, and to act them one at a time, or in small sets. For example, you can use the following script to remove files on Lustre when an normalrmwould be inadequate. Warning: this script will remove files indiscriminately, as withrm -rf. Use with caution.The lustre-mass-delete command is a script that deletes files recursively 100 files at a time such that there is no heavy load on the system.
/usr/local/bin/lustre-mass-delete
For example, if I am already in /lustre/scratch/djohn/stuff and I want to delete /lustre/scratch/djohn/stuff/directory1 I can use:
lustre-mass-delete directory1
Deletes the directory called directory1 in the directory in your relative path.lustre-mass-delete directory1 directory2 directory3
Delets the directories directory1, directory2, and directory3 in your relative path. The use of this script also allows you to use absolute paths. For examplelustre-mass-delete /lustre/scratch/djohn/stuff
Another method, which allows you to review files before they are deleted is the following:lfs find <dir> -t f > rmlist.txt --view list-- sed -e 's:^:/bin/rm :'
rmlist.sh sh rmlist.sh # the directory structure will remain, but unless there are very many, # directories, we can simply delete it: rm -rf <dir> - The default stripe count is currently 4, which means that each file is stored on 4 OSTs. In many cases, you will want to change this, for example, if your I/O is 'file-per-process', the best stripe count is likely 1. For more details about how to set stripe counts, and optimize I/O, please see I/O Tips.

