Outliers Detection with Correlated Subspaces for High Dimensional Datasets

Document Type

Journal Article


Faculty of Computing, Health and Science


School of Computer and Security Science




This article was originally published as: Leng, J. , & Huang, Z. (2011). Outliers detection with correlated subspaces for high dimensional datasets. International Journal of Wavelets, Multiresolution and Information Processing, 9(2), 227-236. Original article available here


Detecting outliers in high dimensional datasets is quite a difficult data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Due to the existence of many irrelevant dimensions in high dimensional datasets, it is of great importance to eliminate the irrelevant or unimportant dimensions and identify outliers in interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques and subspace-based clustering methods. The dimension-growth subspace clustering techniques find interesting subspaces in relatively lower possible dimension space, while dimension-growth approaches intend to find the maximum cliques in high dimensional datasets. This paper presents a novel approach by identifying outliers in correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ the frequent pattern algorithms to find the correlated subspaces. Based on the correlated subspaces obtained, outliers are distinguished from the projected subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional datasets.


Link to publisher version (DOI)