NSC-GA: Search for optimal shrinkage thresholds for nearest shrunken centroid

Document Type

Conference Proceeding




Faculty of Health, Engineering and Science


School of Computer and Security Science/Artificial Intelligence and Optimisation Research Group




This article was originally published as: Dang, V.Q., Lam, C., & Lee, C. (2013). NSC-GA: Search for optimal shrinkage thresholds for nearest shrunken centroid. Proceedings of the 2013 Computational Intelligence in Bioinformatics and Computational Biology. (pp. 44-51). Singapore. IEEE. © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Original article available here


In this paper, a hybrid approach incorporating the Nearest Shrunken Centroid (NSC) and Genetic Algorithm (GA) is proposed to automatically search for an optimal range of shrinkage threshold values for the NSC to improve feature selection and classification accuracy for high dimensional data. The selection of a threshold value is crucial as it is the key factor in the NSC to find significant relative differences between the overall centroid and the class centroid. However, selecting this threshold value via 'trial and error' in empirical approaches can be time-consuming and imprecise. In the proposed NSC-GA approach, shrinkage threshold values for the NSC are encoded as genes in chromosomes that are evaluated using a fitness measure obtained from the classifier in the NSC. The proposed approach automatically searches for the optimal threshold for the NSC by utilizing GA. The proposed approach was evaluated using a number of data sets; Alzheimer's disease, Colon and Leukemia cancer datasets. Experimental results indicated that the proposed approach finds the optimal range of shrinkage thresholds for each dataset, subsequently leading to a higher classification result and involving a smaller number of features when compared to previous studies.