Title

An efficient web document clustering algorithm for building dynamic similarity profile in similarity-aware web caching

Document Type

Conference Proceeding

Publisher

IEEE

Faculty

Faculty of Computing, Health and Science

School

School of Computer and Security Science

RAS ID

14852

Comments

This article was originally published as: Xiao, J. (2012). An efficient web document clustering algorithm for building dynamic similarity profile in similarity-aware web caching . Proceedings of International Conference on Machine Learning and Cybernetics. (pp. 1268-1273). Xian, Shaanxi; China. IEEE. Original article available here

Abstract

Discovering and establishing similarities among web documents have been one of the key research streams in web usage mining community in the recent years. The knowledge obtained from the exercise can be used for many applications such as optimizing web cache organization and improving the quality of web document pre-fetching. This paper presents an efficient matrix-based method to cluster web documents based on a predetermined similarity threshold. Our preliminary experiments have demonstrated that the new algorithm outperforms existing algorithms. The clustered web documents are then applied to a Similarity-aware web content management system, facilitating offline building of the similarity-ware web caches and online updating similarity profiles of the system.

DOI

10.1109/ICMLC.2012.6359547

Share

 
COinS