Meta-analysis for ranked discovery datasets: Theoretical framework and empirical demonstration for microarrays
The combination of results from different large-scale datasets of multidimensional biological signals (such as gene expression profiling) presents a major challenge. Methodologies are needed that can efficiently combine diverse datasets, but can also test the extent of diversity (heterogeneity) across the combined studies. We developed METa-analysis of RAnked DISCovery datasets (METRADISC), a generalized meta-analysis method for combining information across discovery-oriented datasets and for testing between-study heterogeneity for each biological variable of interest. The method is based on non-parametric Monte Carlo permutation testing. The tested biological variables are ranked in each study according to the level of statistical significance. METRADISC tests for each biological variable of interest its average rank and the between-study heterogeneity of the study-specific ranks. After accounting for ties and differences in tested variables across studies, we randomly permute the ranks of each study and the simulated metrics of average rank and heterogeneity are calculated. The procedure is repeated to generate null distributions for the metrics. The use of METRADISC is demonstrated empirically using gene expression data from seven studies comparing prostate cancer cases and normal controls. We offer a new tool for combining complex datasets derived from massive testing, discovery-oriented research and for examining the diversity of results across the combined studies.
Journal: Computational Biology and Chemistry - Volume 32, Issue 1, February 2008, Pages 39–47