Data Migration Tips and tricks¶
Please use hpc-transfer-1
and hpc-transfer-2
for moving large amounts of files.
This not only leaves the compute notes available for actual computation, but also has no risk of your jobs being killed by Slurm.
You should also use tmux
to not risk connection loss during long running transfers.
Moving a project folder¶
-
Define source and target location and copy contents. Please replace the parts in curly brackets with your actual folder names. It is important to end paths with a trailing slash (
/
) as this is interpreted bysync
as “all files in this folder”.$ SOURCE=/data/gpfs-1/work/projects/{my_project}/ $ TARGET=/data/cephfs-2/unmirrored/projects/{my-project}/ $ rsync -ahPX --stats --dry-run $SOURCE $TARGET
Important
Please note the importance of the -X flag to keep extended file attributes (ACLs) which we might have assigned to you if you are a delegate in charge of moving a project.
-
Remove the
--dry-run
flag to start the actual copying process. - Perform a second
rsync
to check if all files were successfully transferred. Paranoid users might want to add the--checksums
flag torsync
or usehashdeep
. Please note the flag--remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.$ rsync -ahX --stats --remove-source-files --dry-run $SOURCE $TARGET
- Again, remove the
--dry-run
flag to start the actual deletion. - Check if all files are gone from the SOURCE folder and remove the empty directories:
$ find $SOURCE -type f | wc -l 0 $ rm -r $SOURCE
Warning
When defining your SOURCE location, do not use the *
wildcard character.
It will not match hidden (dot) files and leave them behind.
Its better to use a trailing slash which matches “All files in this folder”.
Moving user home and work¶
- First copy your home folder while skipping symbolic links.
This is necessary because the locations of work and scratch changed and we don't want to drag along the outdated links.
replace the parts in curly brackets with your actual user name and remove the
--dry-run
flag to perform the actual transfer. It is important to end paths with a trailing slash (/
) as this is interpreted bysync
as “all files in this folder”.$ SOURCE=/data/gpfs-1/home/users/{username_c}/ $ TARGET=/data/cephfs-1/home/users/{username_c}/ $ rsync -ahP --stats --no-links --dry-run $SOURCE $TARGET
- Rsync will not follow the symbolic link to your work folder.
We therefore need to copy contents of your work directory separately.
$ SOURCE=/data/gpfs-1/work/users/{username_c}/ $ TARGET=/data/cephfs-1/home/users/{username_c}/work/ $ rsync -ahP --stats --dry-run $SOURCE $TARGET
-
Perform a second
rsync
per location to check if all files were successfully transferred. Paranoid users might want to add the--checksums
flag torsync
or usehashdeep
. Please note the flag--remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.Warning
Check thoroughly that files were actually copied as expected before removing the
--dry-run
flag. Use absolute paths to not be confused by symbolic links.
$ rsync -ahX --stats --remove-source-files --dry-run $SOURCE $TARGET
$ find $SOURCE -type f | wc -l
0
$ rm -r $SOURCE
Conda environments¶
Conda environment tend to not react well when the folder they are stored in is moved from its original location. There are numerous ways to move the state of your environments, which are described here.
A simple way we can recommend is this:
-
Export all environments prior to the move.
#!/bin/bash for env in $(ls .miniforge/envs/) do conda env export -n $env -f $env.yml done
-
install a new version of conda/mamba in your home (or better in
/data/cephfs-1/work/groups/<group>/users/<user>
) and runsource activate /path/to/new/conda/bin/activate
-
Re-create them after the move:
$ conda env create -f environment.yml
(if you run into errors it might be better to do conda env export -n $env --no-builds -f $env.yaml
)
Note
If you already moved your home folder, you can still activate your old environments like this:
$ conda activate /fast/home/users/your-user/path/to/conda/envs/env-name-here