fulltext.study @t Gmail

Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification

Paper ID Volume ID Publish Year Pages File Format Full-Text
14905 1360 2016 8 PDF Available
Title
Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
Abstract

•Identification of disease genes in semi-supervised learning methods, called positive-unlabeled learning.•In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks.•A Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree.•The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12,950 disease genes with 949 positive genes from six class of diseases and 12,001 unlabeled genes.•Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance.

Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords
Disease gene identification; Biological networks; Positive-unlabeled learning; Ensemble of classifiers; Perceptron
First Page Preview
Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
Publisher
Database: Elsevier - ScienceDirect
Journal: Computational Biology and Chemistry - Volume 64, October 2016, Pages 263–270
Authors
, ,
Subjects
Physical Sciences and Engineering Chemical Engineering Bioengineering