A repo I can point new users to, to enable them to more easily use and share with Globus.
Switch branches/tags
Nothing to show
Clone or download
Dave Jacoby
Latest commit f8131af Mar 8, 2018
Permalink
Failed to load latest commit information.
README.md added transfer.rcac.purdue.edu Mar 8, 2018
transfer.png removed data URL, added PNG Mar 7, 2018

README.md

Getting Started With Globus

The Genomics Core Facility uses Globus as a primary way for researchers to access their data. We store the data on RCAC's Data Depot storage service and the Fortress tape archive to store the data, and control shares via our Customer page.

What is Globus?

Globus is a service created by the University of Chicago to allow researchers to share and collaborate easier. In easiest terms, it serves both as a file transfer tool and as an authentication agent, insuring that only the desired collaborators can access your research data.

Set Up An Account

Individuals at a number of international research institutions can log into Globus using their institutional credentials. Go to globus.org and click Log In to see if your institution is already connected (note for Purdue users: we are - you can select "Purdue University Main Campus", which will come up automatically if you go to Purdue's branded Globus page). If not, use your Google or ORCiD IDs to authenticate.

If those options are not open or desirable to you, you can click use Globus ID to sign in, then Need A Globus ID? Sign Up. You will be prompted for a username, password, full name, email and organization.

Linking your account

Globus will primarily think of you as <username>@globusid.org, while it is likely your data will be shared to you via your organization email. When logged onto Globus, click on the Account tab to see what information Globus has about you.

Globus will not automatically link the email you used to set up the account to your account. Therefore, if you used GlobusID, Google or ORCiD to login (i.e. anything but your institutional identity, e.g. your Purdue career account), it is very important to make Globus aware that this <username>@globusid.org or <personalemail>@gmail.com is in fact the same physical you with institutional identity you@yourwork.edu. Globus calls this linked identities. To link and unlink email addresses and institutional identities to your Globus account, click Manage Your Identities under Account tab. Specifically, Purdue users would want to link their GlobusID to Purdue University Main Campus identity provider.

Download The Software

If your institution is subscribed to Globus, you likely have access to Globus endpoints you can move the data to, but if you do not, or would rather have it on your personal computers instead, you can use Globus Connect Personal to create an endpoint on your computer. It is available for Mac, Windows and Linux. The instructions for installing are easy to follow, but you must have permissions to install software on your computer.

Using Globus

Once you have the endpoint from the Genomics Core and an endpoint you can copy the data to, you can use the Transfer Files page to start a transfer. Choose the source endpoint on one side, the target endpoint on the other, select the files to be moved, and then click the arrow corresponding to the direction the files should be moved.

Transfer Files

Globus uses GridFTP, a protocol that, in most cases, will be able to keep track of the transfer and, if interrupted, start again where it left off. If you are transferring to your laptop, you can turn off your computer and turn it on again, and Globus will normally start again where you left off.

Some resources are not fully capable of this, so I do recommend you transfer to a computer with a fast and stable network connection, preferrably wired. Additionally, especially if you're moving or storing large data sets (and if you're working with my lab, you are), it may be difficult to move or untar your data, especially if your tape-archived data is greater than 50% of your internal disk space. You can set Globus Connect Personal to access your external hard drive, which can prove easier than trying to move it or extract it once you have transferred it.

What is an Endpoint? What is a Path?

An Endpoint is a directory on a computer. This could be the root directory on a large distributed file system, or it could be the Documents directory on your laptop. More specifically, it is a directory on a computer that is named as a thing to be shared. Your institution might share your home directory or your lab's group directory to you, and you can then share your project sub-directory as an endpoint with your collaborators.

A Path identifies a sub-directory within the endpoint, and is written in Unix terms, with / being the "root" directory, where you can't get any lower, /~/ being your specific subdirectory, and so on. If I share my /home/djacoby/secret_research/ directory, #secret_research may be the endpoint but / will be the path, so you cannot use a shared endpoint to search the entirety of my directory.

Asking For Help

If you have problems with any part of this process, there are many points of support. Globus Support has proven very useful to me, and Purdue's Research Computing has been a very good resource for help and new ideas. But very often, the problem is been how the Genomics Core Facility interfaces with Research Computing and Globus, so please, contact me as the first step toward solving your problems.

Thanks

Lev Gorenstein of RCAC contributed to this document.