Adaptive pagerank model for protein function prediction
Matlab
Switch branches/tags
Nothing to show
Latest commit 41b0962 Feb 3, 2016 @bjiang bjiang inline
Permalink
Failed to load latest commit information.
aptrank disp Jan 23, 2016
birgrank updated0120 Jan 20, 2016
data updated0120 Jan 20, 2016
README.md inline Feb 3, 2016
calcAUC.m updated0120 Jan 20, 2016
calcMAP.m updated0120 Jan 20, 2016
colstonnz.m updated0120 Jan 20, 2016
directH.m updated0120 Jan 20, 2016
newroc.m updated0120 Jan 20, 2016
runAptRank.m disp Jan 23, 2016
runBirgRank.m disp Jan 23, 2016
splitR.m updated0120 Jan 20, 2016

README.md

A Quick Start of AptRank

Introduction

AptRank is a network-based method for protein function prediction. We make our codes and data available in the spirit of reproducible research. The codes are written in MATLAB/R2013a.

Datasets

AptRank was tested using 4 datasets in /data directory:

  • yeast;
  • human2010;
  • fly (oversized, 345701688 bytes); and
  • human2015.

The raw datasets of yeast, human2010 and fly were downloaded from GeneMANIA-SW website: http://morrislab.med.utoronto.ca/~sara/SW/ and then were re-organized for use of AptRank.

In each dataset, there are 3 matrices:

  • G, m-by-m protein-protein association network;
  • R, m-by-n protein-function annotations; and
  • H, n-by-n functoin-function hierarchy of Gene Ontology.

To load and reassemble the oversized fly dataset, please type:

load fly_G11.mat;
load fly_G12.mat;
load fly_G21.mat;
load fly_G22.mat;
G = [G11,G12;G21,G22];
load fly_RH.mat;

BirgRank

BirgRank is the prototype of AptRank and a direct application of PageRank on a Bi-relational graph. It is simpler, faster but less accurate than AptRank. To run BirgRank, please execute runBirgRank.m, or simply type:

% Load one of the datasets. Here take the yeast for example:
addpath('./data')
load yeast.mat;

% Split R into Rtrain and Rtest:
rho = 0.5; % the percentage of annotations used in training.
[Rtrain,Rtest] = splitR(R,rho);

% Convert H into bi-directional:
lambda = 0.5;
dH = directH(H,lambda);

% Execution:
addpath('./birgrank')
alpha = 0.5; % PageRank coefficient
theta = 0.5; % percentage of Rtrain in seeding
mu = 0.5; % percentage of random walks staying in G
Xh = birgrank(G,Rtrain,dH,alpha,theta,mu);

% Evaluation:
auc = calcAUC(Xh,Rtrain,Rtest);
disp(['AUROC = ',num2str(auc)])
map = calcMAP(Xh,Rtrain,Rtest);
disp(['MAP = ',num2str(map)])

AptRank

AptRank can provide more accurate prediction, but is more computationally expensive than BirgRank. It requires two important set-ups before execution:

  • CVX: a Matlab-based convex modeling framework (http://cvxr.com/);
  • Multi-thread computing resources.

To run AptRank, please execute runAptRank.m after you download and unzip the CVX package, or follow the instruction below:

% Set up cvx
% (1) Download cvx from http://cvxr.com/cvx/download/ and unzip the package.
% (2) Add the cvx path
addpath('./cvx');
% (3) Set it up:
cvx_setup

% Load one of the dataset:
load yeast.mat;

% Split R into Rtrain and Rtest:
rho = 0.5; % the percentage of annotations used in training.
[Rtrain,Rtest] = splitR(R,rho);

% Convert H into bi-directional:
lambda = 0.5;
dH = directH(H,lambda);

% Launch MATLAB parallel computing pool:
matlabpool('open',12); % Set the number of cores
% NOTE: If you use MATLAB/R2013b or higher version,
% please use parpool() instead of matlabpool().
% e.g., parpool('MyCluster',12);
% For more information about parpool(),
% please refer to http://www.mathworks.com/help/distcomp/parpool.html

% Execution:
addpath('./aptrank')
K = 8; % Markov chain iterations
S = 5; % Number of shuffles
t = 0.5; % Split percentage of Rtrian into Rfit and Reval
diffusion_type = 'twoway'; % Input either 'oneway' or 'twoway'.
Xa = aptrank(G,Rtrain,dH,K,S,t,diffusion_type);

% Evaluation:
auc = calcAUC(Xa,Rtrain,Rtest);
disp(['AUROC = ',num2str(auc)])
map = calcMAP(Xa,Rtrain,Rtest);
disp(['MAP = ',num2str(map)])

% Close MATLAB parallel computing pool:
matlabpool('close');
% For MATLAB/R2013b or higher, please shut down a parallel pool using:
% p = gcp; delete(p);

Citation

If you use AptRank in your academic research, please cite:

B. Jiang, K. Kloster, D. F. Gleich, and M. Gribskov. AptRank: An Adaptive PageRank Model for Protein Function Prediction on Bi-relational Graph. Preprint: http://arxiv.org/abs/1601.05506

Contact

For technical problems, please contact Biaobin Jiang (bjiang-AT-purdue-DOT-edu).