A Quick Start of AptRank
AptRank is a network-based method for protein function prediction. We make our codes and data available in the spirit of reproducible research. The codes are written in MATLAB/R2013a.
AptRank was tested using 4 datasets in
- fly (oversized, 345701688 bytes); and
The raw datasets of yeast, human2010 and fly were downloaded from GeneMANIA-SW website: http://morrislab.med.utoronto.ca/~sara/SW/ and then were re-organized for use of AptRank.
In each dataset, there are 3 matrices:
- G, m-by-m protein-protein association network;
- R, m-by-n protein-function annotations; and
- H, n-by-n functoin-function hierarchy of Gene Ontology.
To load and reassemble the oversized fly dataset, please type:
load fly_G11.mat; load fly_G12.mat; load fly_G21.mat; load fly_G22.mat; G = [G11,G12;G21,G22]; load fly_RH.mat;
BirgRank is the prototype of AptRank and a direct application of PageRank on a Bi-relational graph. It is simpler, faster but less accurate than AptRank. To run BirgRank, please execute
runBirgRank.m, or simply type:
% Load one of the datasets. Here take the yeast for example: addpath('./data') load yeast.mat; % Split R into Rtrain and Rtest: rho = 0.5; % the percentage of annotations used in training. [Rtrain,Rtest] = splitR(R,rho); % Convert H into bi-directional: lambda = 0.5; dH = directH(H,lambda); % Execution: addpath('./birgrank') alpha = 0.5; % PageRank coefficient theta = 0.5; % percentage of Rtrain in seeding mu = 0.5; % percentage of random walks staying in G Xh = birgrank(G,Rtrain,dH,alpha,theta,mu); % Evaluation: auc = calcAUC(Xh,Rtrain,Rtest); disp(['AUROC = ',num2str(auc)]) map = calcMAP(Xh,Rtrain,Rtest); disp(['MAP = ',num2str(map)])
AptRank can provide more accurate prediction, but is more computationally expensive than BirgRank. It requires two important set-ups before execution:
- CVX: a Matlab-based convex modeling framework (http://cvxr.com/);
- Multi-thread computing resources.
To run AptRank, please execute
runAptRank.m after you download and unzip the CVX package, or follow the instruction below:
% Set up cvx % (1) Download cvx from http://cvxr.com/cvx/download/ and unzip the package. % (2) Add the cvx path addpath('./cvx'); % (3) Set it up: cvx_setup % Load one of the dataset: load yeast.mat; % Split R into Rtrain and Rtest: rho = 0.5; % the percentage of annotations used in training. [Rtrain,Rtest] = splitR(R,rho); % Convert H into bi-directional: lambda = 0.5; dH = directH(H,lambda); % Launch MATLAB parallel computing pool: matlabpool('open',12); % Set the number of cores % NOTE: If you use MATLAB/R2013b or higher version, % please use parpool() instead of matlabpool(). % e.g., parpool('MyCluster',12); % For more information about parpool(), % please refer to http://www.mathworks.com/help/distcomp/parpool.html % Execution: addpath('./aptrank') K = 8; % Markov chain iterations S = 5; % Number of shuffles t = 0.5; % Split percentage of Rtrian into Rfit and Reval diffusion_type = 'twoway'; % Input either 'oneway' or 'twoway'. Xa = aptrank(G,Rtrain,dH,K,S,t,diffusion_type); % Evaluation: auc = calcAUC(Xa,Rtrain,Rtest); disp(['AUROC = ',num2str(auc)]) map = calcMAP(Xa,Rtrain,Rtest); disp(['MAP = ',num2str(map)]) % Close MATLAB parallel computing pool: matlabpool('close'); % For MATLAB/R2013b or higher, please shut down a parallel pool using: % p = gcp; delete(p);
If you use AptRank in your academic research, please cite:
B. Jiang, K. Kloster, D. F. Gleich, and M. Gribskov. AptRank: An Adaptive PageRank Model for Protein Function Prediction on Bi-relational Graph. Preprint: http://arxiv.org/abs/1601.05506
For technical problems, please contact Biaobin Jiang (bjiang-AT-purdue-DOT-edu).