Storage Management
Introduction
VOSpace is the CANFAR storage system , an implementation of the Virtual Observatory Specification. It is intended to be used for storing the output of the CANFAR processing system and also for sharing files between members of a collaboration. If the data to be processed is not already on a CADC archive, it can be staged on a VOSpace for faster access. Files in VOSpace are also mirrored in four physical locations, so they are secure against disk failure, and designed for long term.
Access to VOSpace requires a CADC account (registration).
There are two ways to interact with VOspace. The first is with your browser via the web user interface. The web interface is familiar for most people to use and interactive. To access a VOSpace in scripts, the Python based vos module and command line clients are available. Some users might also find the VOSpace filesystem vofs, the FS view is based on FUSE and not recommended for serious data processing, but does provide a convenient interactive interface for exploring a respository.
The web user interface
The vos Python module and command line client
The VOspace can also be accessed via some commands on a terminal or a script. They are part of the vos command line client.
Installation
Below are the installation steps.
- Ensure Python is up-to-date (at least 3.7)
- Install the
vos
module usingpip
, see PyPi.
Using the client command line tools (recommended)
Try the following commands, substituting your CANFAR VOSpace in for VOSPACE (most CANFAR users have VOSpace that is the same name as their CANFAR user name. There are also project VOSpaces):
Details on these commands can be found via the --help
option, e.g. vls --help
. And if you want to see a more verbose output, try vls -v vos:VOSPACE
.
The following commands are defined: vcat
vchmod
vcp
vln
vlock
vls
vmkdir
vmv
vrm
vrmdir
vsync
vtag
Help on these commands can also be found using pydoc
Using the vos python module API
There is documentation built into the libary pydoc vos
. Here we provide a very basic example usage.
The VOSpace FUSE based file system
VOSpace can also be accessed as a remote filesystem using the vofs python module. This technique uses a FUSE layer between file-system actions and the VOSpace storage system. Using vofs makes your VOSpace appear like a regular filesystem.
vofs is not recommended for batch processing or i/o heavy applications
Installation
- Follow the instructions for installing
vos
. Then follow the instructions below. - install the
vofs
python module.
FUSE
Linux
- On some ditros (RHEL 5, CentOS 5, Scientific Linux 5) you may need to add the fuse library:
- On all distros you will also need to add your account to the
fuse
group of users, to be allowed to make filesystem mounts work:
OS-X
- Install OSX-FUSE first (you will need to install this package in ‘MacFUSE Compatibility’ mode, there is a selection box for this during the install).
vofs
The vofs
python module is dtributed via PyPi.
Usage
- Mount all available VOSpaces:
- On some OS-X installations the mountvofs command will result in an error like ‘libfuse.dylib’ not found. Setting the environment variable
DYLD_FALLBACK_LIBRARY_PATH
can help resolve this issue:
Now looking in /tmp/vospace
you should see a listing of all available VOSpaces that you have read access.
- List the root of vospace
- Unmount the VOSpace:
- Mount a specific VOSpace:
The mountvofs
command creates a cache directory where local copies of files from the VOSpace are kept, as needed. If the cached version is older than the copy on VOSpace then a new version is pulled. You can specify the size of the cache (default is 50 GBytes) and the location (default is ${HOME}/vos:USER
) on the command line.
When a file is opened in a mounted directory, mountvofs gets the remote copy from VOspace, if the local copy is out of date. When the file is written to disk and closed, the VOSpace file system puts the file back into VOspace. With most science software, these operations typically occur rarely and the illusion of a local disk is maintained. Most editors, however, tend to write temporary versions of a file frequently. In this case, the file is frequently written to VOspace. Performance may suffer in this case, or not even being compatible with the application.
- Options
There are many options that can help improve your vofs experience (in particular vofs is most useful in –readonly mode). To see all the possible options use the –help flag.
Retrieving CANFAR X509 certificates
To access a VOSpace, the command line client needs a certificate. These certificates are created when a CADC account is created, and a short-lived proxy of this certificate can be obtained. One easy way is with the cadc-get-cert
command line, distributed with the cadcutils
library that was automatically installed as part of the vos
installation process above.
Using vos with batch processing VM
In batch processing, the CADC proxy certificate will be transferred automatically to the batch VMs, ensuring the certificate is valid at submission time. If this does not happen, there are two approaches:
Secure but slightly complicated
- On the CANFAR batch submission host, batch.canfar.net, run the command
cadc-get-cert
.
- Copy the file
$HOME/.ssl/cadcproxy.pem
to the directory where you are submitting your jobs from.
- Add cadcproxy.pem to the list of files to transfer when the job executes (this is the done by adding these lines the submission file).
- Add this line to the start of the batch script
Insecure but slight less complicated
Use the cadc-get-cert
script at the start of every job. To avoid cadc-get-cert
from
asking for a password, ensures there is a valid $HOME/.netrc
file on the snapshotted VM, containing these lines:
WARNING: this is not a fully secure solution.