Document Type

Conference Proceeding

Publisher

IEEE

Faculty

Faculty of Computing, Health and Science

School

School of Computer and Security Science

RAS ID

10245

Comments

This is an Author's Accepted Manuscript of: Leng, J. (2010). A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets. Proceedings of 2010 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010). (pp. 162-165). Chengdu, China. IEEE.

© 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Abstract

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good' clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing so, an extension to the subspacebased clustering algorithm is proposed so as to find the ‘good’ subspaces, and then outliers are identified in the projected subspaces using some classical outlier detection techniques such as distance-based and density-based algorithms. Comprehensive case studies are conducted using various types of subspace clustering and outlier detection algorithms. The experimental results demonstrate that the proposed method can detect outliers effectively and efficiently in high dimensional data sets.

Access Rights

free_to_read

Share

 
COinS