A simple guide to setting up RClone on Phoenix using Conda
Published on July 28, 2021 by Alastair
Phoenix Guide HPC Conda
11 min READ
Often we want to transfer data from a cluster to a cloud storage location so it can be shared with someone. Here, I walk through how to set up RClone on Phoenix to enable simple data transfer from Phoenix to Box.
A tricky aspect of working on a compute cluster is sharing the data that we generate. Often we have to download the data manually, then upload it to a cloud storage location which is accessible to all parties that are interested. This might be ok for small datasets (e.g. a few files), but becomes increasingly tedious when files are many (thousands) or large. This document is going to introduce Rclone
, a command line program to manage files on cloud storage.
As stated above, Rclone
is a program that provides a command-line-interface (CLI) to managing files in the cloud. It has support for over 40 cloud storage products and is totally open source. It has equivalents of standard command-line commands such as mv
, cp
, mount
and a whole lot more, meaning it should be familiar to anyone who has some experience using UNIX
systems. It also provides some nice data-checks when copying/moving files, preserving timestamps whilst also verifying checksums at all times (i.e. ensuring files are identical between the systems).
The take-away should be that the software has many features, works across a range of could storage products and should be relatively easy to use.
Below I’ll outline the steps that are needed to configure Rclone
with Adelaide University’s Phoenix HPC. I’ll specifically demonstrate how to run the configuration with Box
, however the setup will be pretty similar regardless of the cloud product you use.
The first step is to create an authorisation password on the Box
website. The steps are:
Account settings
from the drop-down menuAccount
tab (should be on this by default)Authentication
. It should say that you need to create a passwordThat’s all that needs to be done for this step.
Next, we need to install Rclone
to Phoenix. I’m assuming you know how to log in to Phoenix, however if you don’t, do the following:
First, open your terminal (on Mac this is terminal, on windows I have no idea).
Next, log in to Phoenix using the following command:
$ ssh phoenix-login1.adelaide.edu.au
$ ...
$ Password: <type password here>
When you run the ssh
command above, you’ll be met with a whole lot of text. At the bottom of it all it should ask for your password. Type your university password here. NOTE: No characters will show up as you type, but they will be there!
You now should be on Phoenix. Now we’ll install the software using Conda
. At the command-line, run the following:
$ conda create -n RCLONE_env -c conda-forge rclone
This will create a conda
environment called RCLONE_env
which will house the software Rclone
. Any time we want to use Rclone
, we’ll have to activate the environment. Once the environment has been created, activate it using the follwoing command:
$ conda activate RCLONE_env
Your command-line prompt should look something like the following:
(RCLONE_env) a1234567@l01 ~ $
You’ll also need to install Rclone
on your own system, as one of the set-up steps needs to be run on a computer with internet access. Do this either via conda
or through a local install.
Now that we have Rclone
installed, we need to configure it so it has access to our Box
account. This will seem daunting, but is relatively straight-forward.
The first step is to run the command (from the active environment!) below:
$ rclone config
This will bring up some text with some options. It should look similar to below:
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
We want to create a new remote, so type n
as I have done in the example above, and press enter.
Next, we have to choose the type of cloud storage that we want to configure, in this case Box
. In my session, Box
was number 6. It may be different in yours. At any rate, you should be met with a list of options with a corresponding number. Choose whichever number corresponds to Box
and press enter.
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / 1Fichier
\ "fichier"
2 / Alias for an existing remote
\ "alias"
3 / Amazon Drive
\ "amazon cloud drive"
4 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, and Tencent COS
\ "s3"
5 / Backblaze B2
\ "b2"
6 / Box
\ "box"
...
38 / Yandex Disk
\ "yandex"
39 / Zoho
\ "zoho"
40 / http Connection
\ "http"
41 / premiumize.me
\ "premiumizeme"
42 / seafile
\ "seafile"
Storage> 6
The next few prompts can all be set to the default value. For the options shown below, just press enter:
** See help for box backend at: https://rclone.org/box/ **
OAuth Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Box App config.json location
Leave blank normally.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a string value. Press Enter for the default ("").
box_config_file>
Box App Primary Access Token
Leave blank normally.
Enter a string value. Press Enter for the default ("").
access_token>
Enter a string value. Press Enter for the default ("user").
Choose a number from below, or type in your own value
1 / Rclone should act on behalf of a user
\ "user"
2 / Rclone should act on behalf of a service account
\ "enterprise"
box_sub_type>
Stop when you reach the prompt below:
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n>
Here, we do want to edit the advanced config
. We’re not really going to change any of the defaults, but there is one key thing we do need to change.
Continue by selection ‘y
’ to edit the config.
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> y
Some more prompts will come up; set them all to default (see below):
OAuth Access Token as a JSON blob.
Enter a string value. Press Enter for the default ("").
token>
Auth server URL.
Leave blank to use the provider defaults.
Enter a string value. Press Enter for the default ("").
auth_url>
Token server url.
Leave blank to use the provider defaults.
Enter a string value. Press Enter for the default ("").
token_url>
Fill in for rclone to use a non root folder as its starting point.
Enter a string value. Press Enter for the default ("0").
root_folder_id>
Cutoff for switching to multipart upload (>= 50MB).
Enter a size with suffix k,M,G,T. Press Enter for the default ("50M").
upload_cutoff>
Max number of times to try committing a multipart file.
Enter a signed integer. Press Enter for the default ("100").
commit_retries>
This sets the encoding for the backend.
See: the [encoding section in the overview](/overview/#encoding) for more info.
Enter a encoder.MultiEncoder value. Press Enter for the default ("Slash,BackSlash,Del,Ctl,RightSpace,InvalidUtf8,Dot").
encoding>
Once you reach a prompt about the Remote config
, stop:
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n>
Here, we do NOT want to use the auto-config. We want to select NO
as Phoenix does not have internet access (in the traditional sense). To authorise remote cloud storage, Rclone
will open a webpage with an authorisation button. Phoenix will not be able to do this if you select y
.
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n
You’ll now be greeted by the following:
For this to work, you will need rclone available on a machine that has
a web browser available.
For more help and alternate methods see: https://rclone.org/remote_setup/
Execute the following on the machine with the web browser (same rclone
version recommended):
rclone authorize "box"
Then paste the result below:
result>
Copy and paste rclone authorize "box"
into the terminal on YOUR PERSONAL SYSTEM.
~ via 🅒 base <-- NOTICE THIS IS DIFFERENT (MY LOCAL COMPUTER)
➜ rclone authorize "box"
This should open up a web browser wich will ask you to Grant access to Box
. Once you authorise this, an authorisation key will be generated in your terminal. Copy this key and paste it into the terminal ON PHOENIX
Then paste the result below:
result> --> PASTE KEY HERE FROM LOCAL MACHINE <--
This should complete the set up. You might be asked a couple of extra things, just use the default values. Rclone
will then take you back to a page that looks like the following:
Name Type
==== ====
Box box
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>
If Box
is under the Name column and box
is under the Type column, then you’re all good and can exit (select ‘q
’):
Name Type
==== ====
Box box
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
To test if the configuration has been successful, run the following command at the command-line:
$ rclone lsd Box:
This should return all directories at the top level of your Box
repository. For example, for me this returns:
(RCLONE_env) a1645424@l01 ~ $ rclone lsd Box:
-1 2020-10-15 12:34:30 -1 00_PhD_papers_data
-1 2019-02-12 11:13:37 -1 1901_Echidnaproject
-1 2020-01-29 10:35:12 -1 1903_Ray_lungfish
-1 2019-05-02 13:43:56 -1 1905_variants
-1 2019-06-12 08:02:57 -1 1906_James_Alastair
-1 2019-07-01 12:32:11 -1 1907_CODEML_checking
-1 2019-11-13 16:09:21 -1 1907_James_Alastair
-1 2019-07-22 14:31:47 -1 1907_codeml
-1 2019-10-10 08:06:47 -1 1908_Manuscript
-1 2021-05-12 07:43:58 -1 1910_Transcriptomes
-1 2020-02-07 14:04:48 -1 2002_James_Alastair
...
Copying files to and from Box
follows the same principals as copying files to a server. The copy command is:
$ rclone copy source destination
When copying files from Phoenix to Box
, you’ll want a command that looks something liek:
(RCLONE_env) a1234567@l01 $ rclone copy -p ${USER}/dir/*.txt Box:/location/in/Box/cloud
The above will copy all text files from ${USER}/dir
to a directory in your Box
repository. The Box
destination directory will be created if it does not exist.
Note that we prefixed the Box
path with:
Box:
This tells Rclone
that we want to copy files to the Box
remote that we set up. If during set-up we named our remote Box-2-eletricboogaloo
, we would change our command to the following:
(RCLONE_env) a1234567@l01 $ rclone copy -p ${USER}/dir/*.txt Box-2-eletricboogaloo:...
To copy files from Box
to Phoenix, simply reverse the order:
(RCLONE_env) a1234567@l01 $ rclone copy -p Box:/some/dir/*.txt ${USER}/dir/output
That will copy all text files from /some/dir/
on Box
to ${USER/dir/output
on Phoenix.