Cancer DNA Microarray Analysis Considering Multi-subclass with Graph-based Clustering Method
It is well known that various genes related to cell cycle, cell–cell adhesion, and transcriptional regulation cause the onset of cancer. Moreover, environmental factors including age, sex, and lifestyle can also contribute to the onset of cancer. Therefore, it is difficult to ascertain which factors influence the onset. Thus, patients suffering from same disease can be divided into several distinct groups. In the present study, we applied graph-based clustering to several DNA microarray datasets before the classification analysis. Several clusters formed by the graph-based clustering were used for the construction of multi-class classification model with the k-nearest neighbor and for finding genes, which are specific to a certain cluster, by One vs. Others classification. Using this approach, the classification model was constructed for four microarray datasets, leukemia, breast cancer, prostate cancer, and colon cancer, and the accuracies of classification with k-nearest neighbor were all more than 80%. And in the breast cancer dataset, we succeeded in finding genes that are specific in a cluster consisting of 38 control group samples. These results indicate the importance of sample clustering before classification model construction.
Journal: Journal of Bioscience and Bioengineering - Volume 106, Issue 5, November 2008, Pages 442–448