Getting Started With Globus
The Genomics Core Facility uses Globus as a primary way for researchers to access their data. We store the data on RCAC's Data Depot storage service and the Fortress tape archive to store the data, and control shares via our Customer page.
What is Globus?
Globus is a service created by the University of Chicago to allow researchers to share and collaborate easier. In easiest terms, it serves both as a file transfer tool and as an authentication agent, insuring that only the desired collaborators can access your research data.
Set Up An Account
Individuals at a number of international research institutions can log into Globus using their institutional credentials. Go to globus.org and click Log In to see if your institution is already connected (note for Purdue users: we are - you can select "Purdue University Main Campus", which will come up automatically if you go to Purdue's branded Globus page). If not, use your Google or ORCiD IDs to authenticate.
If those options are not open or desirable to you, you can click use Globus ID to sign in, then Need A Globus ID? Sign Up. You will be prompted for a username, password, full name, email and organization.
Linking your account
Globus will primarily think of you as
<username>@globusid.org, while it is likely your data will be shared to you via your organization email. When logged onto Globus, click on the Account tab to see what information Globus has about you.
Globus will not automatically link the email you used to set up the account to your account. Therefore, if you used GlobusID, Google or ORCiD to login (i.e. anything but your institutional identity, e.g. your Purdue career account), it is very important to make Globus aware that this
<personalemail>@gmail.com is in fact the same physical you with institutional identity
firstname.lastname@example.org. Globus calls this linked identities. To link and unlink email addresses and institutional identities to your Globus account, click Manage Your Identities under Account tab. Specifically, Purdue users would want to link their GlobusID to Purdue University Main Campus identity provider.
Download The Software
If your institution is subscribed to Globus, you likely have access to Globus endpoints you can move the data to, but if you do not, or would rather have it on your personal computers instead, you can use Globus Connect Personal to create an endpoint on your computer. It is available for Mac, Windows and Linux. The instructions for installing are easy to follow, but you must have permissions to install software on your computer.
Once you have the endpoint from the Genomics Core and an endpoint you can copy the data to, you can use the Transfer Files page to start a transfer. Choose the source endpoint on one side, the target endpoint on the other, select the files to be moved, and then click the arrow corresponding to the direction the files should be moved.
Globus uses GridFTP, a protocol that, in most cases, will be able to keep track of the transfer and, if interrupted, start again where it left off. If you are transferring to your laptop, you can turn off your computer and turn it on again, and Globus will normally start again where you left off.
Some resources are not fully capable of this, so I do recommend you transfer to a computer with a fast and stable network connection, preferrably wired. Additionally, especially if you're moving or storing large data sets (and if you're working with my lab, you are), it may be difficult to move or untar your data, especially if your tape-archived data is greater than 50% of your internal disk space. You can set Globus Connect Personal to access your external hard drive, which can prove easier than trying to move it or extract it once you have transferred it.
What is an Endpoint? What is a Path?
An Endpoint is a directory on a computer. This could be the root directory on a large distributed file system, or it could be the Documents directory on your laptop. More specifically, it is a directory on a computer that is named as a thing to be shared. Your institution might share your home directory or your lab's group directory to you, and you can then share your project sub-directory as an endpoint with your collaborators.
A Path identifies a sub-directory within the endpoint, and is written in Unix terms, with
/ being the "root" directory, where you can't get any lower,
/~/ being your specific subdirectory, and so on. If I share my
#secret_research may be the endpoint but
/ will be the path, so you cannot use a shared endpoint to search the entirety of my directory.
Asking For Help
If you have problems with any part of this process, there are many points of support. Globus Support has proven very useful to me, and Purdue's Research Computing has been a very good resource for help and new ideas. But very often, the problem is been how the Genomics Core Facility interfaces with Research Computing and Globus, so please, contact me as the first step toward solving your problems.
Lev Gorenstein of RCAC contributed to this document.