Dimensionality of amino acid space and solvent accessibility prediction with neural networks
Solvent accessibility prediction from amino acid sequences has been pursued by several researchers. Such a prediction typically starts by transforming the amino acid category (or type) information into numerical representations. All twenty amino acids can be completely and uniquely represented by 20-dimensional vectors. Here, we investigate if the amino acid space defined in this way really requires twenty dimensions. We tried to develop corresponding representations in fewer dimensions. A method for searching optimal codification schema in an arbitrary space using neural networks was developed. The method is used to obtain optimal encoding of amino acids at various levels of dimensionality, and applied to optimize the amino acid codifications for the prediction of the solvent accessibility values of the proteins using feed-forward neural networks. The traditional 20-dimensional codification seems to be redundant in solving the solvent accessibility prediction problem, since a 1-dimensional codification is able to achieve almost the same degree of accuracy as the 20-dimensional codification. Optimal coding in much fewer dimensions could be used to make the predictions of accessible surface area with almost the same degree of accuracy as that obtained by a fully unique 20-dimensional coding. The 1-dimensional amino acid codification for solvent accessibility prediction obtained by a purely mathematical way based on neural networks is highly correlated with a physical property of the amino acids, namely their average solvent accessibility. The method developed to find the optimal codification is general, although the codification thus produced is dependent on the type of estimated property.
Journal: Computational Biology and Chemistry - Volume 30, Issue 2, April 2006, Pages 160–168