fulltext.study @t Gmail

newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation

Paper ID Volume ID Publish Year Pages File Format Full-Text
14995 1366 2014 9 PDF Available
Title
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation
Abstract

•Give a comprehensive sequence representation.•Establish our prediction model basing on support vector machine classifier.•Compared with the relevant prediction methods in two independent test datasets.

Identification of DNA-binding proteins is essential in studying cellular activities as the DNA-binding proteins play a pivotal role in gene regulation. In this study, we propose newDNA-Prot, a DNA-binding protein predictor that employs support vector machine classifier and a comprehensive feature representation. The sequence representation are categorized into 6 groups: primary sequence based, evolutionary profile based, predicted secondary structure based, predicted relative solvent accessibility based, physicochemical property based and biological function based features. The mRMR, wrapper and two-stage feature selection methods are employed for removing irrelevant features and reducing redundant features. Experiments demonstrate that the two-stage method performs better than the mRMR and wrapper methods. We also perform a statistical analysis on the selected features and results show that more than 95% of the selected features are statistically significant and they cover all 6 feature groups. The newDNA-Prot method is compared with several state of the art algorithms, including iDNA-Prot, DNAbinder and DNA-Prot. The results demonstrate that newDNA-Prot method outperforms the iDNA-Prot, DNAbinder and DNA-Prot methods. More specific, newDNA-Prot improves the runner-up method, DNA-Prot for around 10% on several evaluation measures. The proposed newDNA-Prot method is available at http://sourceforge.net/projects/newdnaprot/

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords
DNA-binding proteins; Features; Feature selection methods; SVM; ROC
First Page Preview
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation
Publisher
Database: Elsevier - ScienceDirect
Journal: Computational Biology and Chemistry - Volume 52, October 2014, Pages 51–59
Authors
, , , , , , ,
Subjects
Physical Sciences and Engineering Chemical Engineering Bioengineering