Title

Outliers Detection with Correlated Subspaces for High Dimensional Datasets

Document Type

Journal Article

Faculty

Faculty of Computing, Health and Science

School

School of Computer and Security Science

RAS ID

12460

Comments

This article was originally published as: Leng, J. , & Huang, Z. (2011). Outliers detection with correlated subspaces for high dimensional datasets. International Journal of Wavelets, Multiresolution and Information Processing, 9(2), 227-236. Original article available here

Abstract

Detecting outliers in high dimensional datasets is quite a difficult data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Due to the existence of many irrelevant dimensions in high dimensional datasets, it is of great importance to eliminate the irrelevant or unimportant dimensions and identify outliers in interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques and subspace-based clustering methods. The dimension-growth subspace clustering techniques find interesting subspaces in relatively lower possible dimension space, while dimension-growth approaches intend to find the maximum cliques in high dimensional datasets. This paper presents a novel approach by identifying outliers in correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ the frequent pattern algorithms to find the correlated subspaces. Based on the correlated subspaces obtained, outliers are distinguished from the projected subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional datasets.

DOI

10.1142/S0219691311004067

 

Link to publisher version (DOI)

10.1142/S0219691311004067