Skip to content

Where to store data

MUSICA provides different facilities for data: on the hardware side, there is a high-performance all-flash WekaIO parallel filesystem and a high-volume IBM Storage Scale Parallel Filesystem (aka GPFS) plus a node-local flash storage. They are accessible under:

  • WekaIO:
    $SCRATCH expands to /scratch/fsXXXXXX/username

  • IBM Spectrum Scale:
    $HOME expands to /home/username
    $DATA expands to /data/fsXXXXXX/username

where XXXXXX is the project number

  • Local SSD
    /local

Home Storage

$HOME is the location of the user UNIX home directory. It is entirely located on NVMe discs. Each user has their own home directory, independent of project - it is available within every project the user is part of. Quota on $HOME is 50GB and 106 number of files for every user. The storage size can not be extended but the number of files can be increased upon request.

Project Storage

$DATA is a tiered file system containing 1PB flash and around 10PB of HDD storage. It stores up to 10% of the data and all metatdata on NVMe discs and the rest on spinning discs. Frequently used files are automatically moved to the NVMe tier, while unused files are moved back to the HDD tier.
Quota on $DATA is 10TB and 106 number of files for the project and can be extended up to 100TB. If there is need for even more storage, specific arrangements can be made.

Access permissions

The files on $DATA are usually group read-/writable so project members can exchange data.

Check quota

You can check your current quota usage on each of the two storage systems with:

mmlsquota --block-size auto -j data_fs7XXXX data [or home_fs7XXXX home]

Scratch Storage

$SCRATCH is a very fast all-flash system intended for scratch (non-longtern) data. It has a size of 4PB with a quota of 5TB for the project and no limit on the number of files. Extension is possible upon request.

Data retention

$SCRATCH is intended for temporary data and is therefore not backed up and may be cleared when space is needed.

For running but not for storing

/local

/local is a locally mounted NVMe disk. The size is about 7TB on a GPU node and 2TB on a CPU-only node.. Data retention is not guaranteed, all data is lost on a reboot, so they need to be transferred after use.

/tmp or /dev/shm

Can be used for intensive I/O, and take up to half a compute node memory and the data resides in the shared memory (RAM) of the node. The data gets deleted after the job, so move results to $DATA

Warning

/local and /tmp might be useful to run jobs but they are NOT for permanent storage.

Requesting extra storage

Submit your request via the project website

What to store where

$HOME

User settings and various caches and config directories are automatically here. Additionally, HOME can be used for custom configuration, code and scripts, custom software and environments (such as conda). Do NOT store any scientific or research data here - this includes original input data as well as final results. They get too big very fast, and heavy I/O to HOME should also be avoided.

$DATA

This is the main project volume, so all scientific/research data must go there, including raw input data, final results and all types of intermediate data. Please remove any data (especially temporary or intermediate data) that is no longer in use - the system is not set up for long-term archiving.

$SCRATCH

Temporart data that is needed during jobs or that are created by jobs. Especially data that benefits form fast read or write. Data can be removed there at any time, so results or other long-term data need to be moved out.