-
Notifications
You must be signed in to change notification settings - Fork 2
File Storage
EMHATIC has a variety of locations where we can store files. They each have different properties and different recommended uses as described below. Important things to note:
- There is no "perfect" place that you can use for any kind of file in any circumstance.
- There is no area of unlimited storage where you can put files and have them last forever. Be judicious about what you keep.
/pnfs/emph/scratch/users/$USER
This area does not have a limited size per se, but files also have a limited lifetime. Typically about 30 days since the last access time. This is where all direct grid output should go. While the file system can be accessed via CephFS (e.g. like a regular disk), this is not performant and is only recommended for small-scale work. The xrootd protocol is more reliable and can be used easily with the pnfs2xrootd command.
/pnfs/emph/persistent/users/$USER
Persistent is not backed up and is limited in size, but files will never be removed automatically. As of now there is no way to apply formal quotas on this file system area, and it's the primary area for shared files like CAFs, so we ask users to use /pnfs/persistent only when it is truly needed (i.e. files that will be used regularly, particularly those taken as input into grid jobs, on a timescale of longer than 30 days). In general /exp/emph/data/users/$USER (see below) should be users' first choice for storing files that do not need to be backed up, and the tape-backed storage /pnfs/emph/users (see further below) for files that need long-term storage.
/exp/emph/app/users/$USER and /exp/emph/data/users/$USER
This is a much more "normal" disk system than dCache, and will give the best performance when using files interactively. The Ceph FileSystem is an upgraded system from the older NFS-based BlueArc. It is not accessible from the grid.
-
/exp/emph/apparea is intended for code development -
/exp/emph/dataarea if for storing short-term data files
Additionally, the app area is backed up for up to 14 days, with snapshots created once a day; but please commit often to git!
They are hidden in .snap directories within each Ceph directory. Special directories do not appear in the containing directory listing:
% ls /exp/emph/app/users/$USER/.snap
Starting March 20th, 2024, you can NOT write new data to the NFS ("BlueArc") data volumes. They will be read-only. From Friday May 31st 2024, the NFS volumes will be inaccessible and data on them irretrievable; this section of info will thus become obsolete.
Users are responsible for migrating their data from /emph/data/users/$USER to the new CephFS data volume: /exp/emph/data/users/$USER
Your $USER directory has been made for you (if you touched the NFS volume within the last 3 years). If your $USER directory doesn't exist contact @gavinsdavies on Slack.
Top tip: you may want to run the actions below inside your favorite terminal emulator (i.e. screen or tmux) so that you can
start the rsync transfer and leave it going uninterrupted, even if your connection dies.
For migrating from /emph/data to /exp/emph/data:
% cd /emph/data/users/${USER}
% rsync -ax --info=progress2 --no-i-r /emph/data/users/${USER}/ /exp/emph/data/users/${USER}/ # sync your /emph/data-located files to ceph
% if [ $? -eq 0 ]; then touch MOVED; else touch RSYNC_FAILED; fi # tells us you've finished syncing
Grab a ☕ or 🍵. If you have a lot of files, expect this to take a while. Just sit tight, and don't write any new data until it is complete.
The rsync command computes the whole transfer upfront (--no-i-r) and gives you an overall progress fraction --info=progress2.
/pnfs/emph/users/$USER
If you want to put files on tape please contact the experts!