The National Institute for Computational Sciences

Lustre Striping Guide

One of the main factors leading to the high performance of Lustre file systems is the ability to stripe data across multiple storage targets (OSTs) in a round-robin fashion. Basically files can be split up into multiple chunks that will then be stored on different OSTs across the Lustre system.

Any file is just a linear sequence of bytes. The logical view of a file, divided into segments, may appear as the following:

In the physical view, the five segments may be striped across four OSTs:

Striping offers two benefits: 1) an increase in bandwidth because multiple processes can simultaneously access the same file, and 2) the ability to store large files that would take more space than a single OST. However, striping is not without disadvantages: 1) increased overhead due to network operations and server contention, and 2) increased risk of file damage due to hardware malfunction. Users have the option of configuring the size and number of stripes used for any file.

Default Stripe Settings

The default stripe settings vary across machines but the stripe count is generally set to 1 or 2 and the stripe size is generally 1 MB. To determine the stripe settings for a file or directory use the lfs getstripe command:

> lfs getstripe
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_stripe_offset:  186
        obdidx           objid          objid            group
           186        52153455      0x31bcc6f                0
           258        53124880      0x32a9f10                0
            25        52477227      0x320bd2b                0
            97        52444876      0x3203ecc                0

In this example, the file is striped across 4 OSTs with a stripe size of 1 MB. The obdidx numbers listed are the indices of the OSTs used in the striping of this file. Using getstripe on a directory gives information for the directory plus the files contained in the directory. You can limit getstripe to only show directory information by using the -d option. Alternately, you can use the -r option to recursively follow all subdirectories.

General Considerations

Large files benefit from higher stripe counts. By striping a large file over many OSTs, you increase bandwidth for accessing the file and can benefit from having many processes operating on a single file concurrently. Conversely, a very large file that is only striped across one or two OSTs can degrade the performance of the entire Lustre system by filling up OSTs unnecessarily. A good practice is to have dedicated directories with high stripe counts for writing very large files into.

Another scenario to avoid is having small files with large stripe counts. This can be detrimental to performance due to the unnecessary communication overhead to multiple OSTs. A good practice is to make sure small files are written to a directory with a stripe count of 1—effectively, no striping.

Setting Striping Configurations

The lfs setstripe command is used to dictate a particular striping configuration for a file or directory. For a file, setstripe:

  • gives an error if the file already exists (see Note below),
  • else it creates an empty file with the desired stripe settings.

Using setstripe on a directory:

  • changes the stripe settings for the directory,
  • any file subsequently created in the directory will inherit those settings,
  • does not affect existing files in directory.

Note: Once a file has been written to Lustre with a particular stripe configuration, you cannot simply use setstripe to change it. The file must be re-written with a new configuration. Generally, if you need to change the striping of a file, you can do one of two things:

  • using setstripe, create a new, empty file with the desired stripe settings and then copy the old file to the new file, or
  • setup a directory with the desired configuration and cp (not mv) the file into the directory

The options for lfs setstripe are:

  • -c to set the stripe count; 0 means use the system default (usually 1) and -1 means stripe over all available OSTs (up to the system-inherent limit of 160)
  • -s to set the stripe size; 0 means use the system default (usually 1 MB) otherwise use k, m or g for KB, MB or GB respectively
  • For example to configure the (existing) directory bigdir for holding very large files, we could set its stripe count to 50 and stripe size to 32 MB with:

> lfs setstripe -c 50 -s 32m bigdir

I/O Considerations

  • With a file-per-process I/O pattern, it is best to use no striping (stripe count of 1). This will limit OST contention when dealing with a large number of files/processes.
  • When accessing a single shared file from many processes, the stripe count should equal the number of processes if possible. The size and location of I/O operations from the processes should be carefully managed to allow as much stripe alignment as possible resulting in each process accessing only a single OST. Avoid I/O access patterns where a single process must access all utilized OSTs.
  • Open files read-only when possible.
  • Avoid unnecessary file operations as in the following:
    • multiple cycles of open-write-close on the same file during the course of an application
    • many processes retrieving information (stat) from the same file
    • multiple processes reading the same small file
    • excessive use of stdout or stderr from parallel processes
    For reading a small file or retrieving stat information, it is best to have a single process perform the I/O and then broadcast the results to other processes.

For more detailed information and I/O benchmarks see here.