Relationship between global structural parameters and Enzyme Commission hierarchy: Implications for function prediction
In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► We evaluate the relationship between global structure parameters and EC function. ► A distance between enzymes in optimized phase space of characteristics is defined. ► Same EC and different EC groups distance distributions are compared. ► Our results suggest that global features cannot distinguish EC function.
Journal: Computational Biology and Chemistry - Volume 40, October 2012, Pages 15–19