Mutual Ranks and Modules
Set of scripts to identify co-expressed gene sets (i.e., modules) in gene co-expression networks.
Transforming Pearson’s correlation coefficents (PCCs) into Mutual Ranks (MRs) — first described by Obayashi & Kinoshita — is a good idea if you want to compare between different datasets and/or functional categories (Obayashi et al, 2018; Liesecke et al, 2018). However, the transformation requires significant memory and disk space to compute when performed on genome-size matrices. Moreover, MRs range from 1 to n-1 (where n is the number of genes in the genome), which does not translate well to network edge weights. I’ve implemented a series of R and Python scripts for creating MRs and gene modules directly from a directory of Kallisto gene abundance estimates. The code as been developed to run on multiple threads when possible, which can significantly decrease the total runtime. Following MR calculation, the pipeline applies exponential decay functions to transform MRs into network edge weights and calls co-expressed gene sets (i.e. modules) using the program ClusterONE.
An in depth tutorial is provided and includes information on files types, run time, memory requirements, etc. The TL:DR steps are provided below.
- Create a gene expression matrix
Rscript ../scripts/transform_counts.r -a example/abundance_files.txt -l example/transcripts2genes.txt -s example/sample_conditions.txt -t tag -o example/gene_counts
- Calculate PCC and MR for all gene pairs
python ../scripts/calc_mr_pcc.py -i example/gene_counts_normalized.matrix -o example/gene_counts_normalized_mr -t 20
- Run clusterONE and call co-expressed gene modules
python ../scripts/create_network_and_modules.py -i example/gene_counts_normalized_mr -c ../scripts/cluster_one-1.0.jar -d 5