The National Institute for Computational Sciences

HPSS (Darter and Nautilus): Splitting a HPSS archive into multiple files

You can use the split command to split an archive into multiple files. Please follow the steps and examples provided below.

"Cd" into your /lustre/medusa/ directory where your data is temporary stored and run the following command. Make sure the file striping (https://www.nics.tennessee.edu/computing-resources/file-systems/lustre-s...) in the directory is appropriate for what is being done.

NOTE: The syntax is very important. Please pay close attention to the "." at the end of the filename (i.e. myarchive.tar.).

If you want to combine multiple files into an archive, then split them into 1 GB files, do the following:

$ tar -cvf -file1 file2 file3 | split --bytes=1G --suffix-length=4 --numeric-suffix - myarchive.tar. 

When the files need to be recombined and untarred:

$ cat myarchive.tar.* | tar xvf - 

If you already have a single tar file and you want to split it into 10 GB files, do the following:

$ split --bytes=10G --suffix-length=4 --numeric-suffix lustre.scratch.Cray_Tests.tar lustre.scratch.Cray_Tests.tar.split.

If you have a directory you want to tar up, then split into 10MB files (in this case an "applications" directory) you would do the following:

$ tar -cvf - applications | split --bytes=10M --suffix-length=4 --numeric-suffix - applications.tar.

The size of the split files is determined by the option --bytes=??

When the command finishes executing (which could be a while), you will end up with files applications.tar.0000, applications.tar.0001, and so on. See example output below.

$ ls -l applications.tar* 
-rw-r--r-- 1 you 10485760 Jul 24 13:49 applications.tar.0000 
-rw-r--r-- 1 you 10219520 Jul 24 13:49 applications.tar.0001 

After splitting your achieves, type hsi put *.tar.*. This will start uploading the files to HPSS. This could also take a while so feel free to use the nohup command with this.

When you are ready to retrieve the files for use, type hsi get *.tar.*. After all the files have been transferred to your /lustre/medusa/$USER area, if you want to combine the split files and extract their contents run the following command:

 $ cat applications.tar.* | tar xvf - 

Wait a bit and all the files should join and one file called applications.tar will be extracted.