Multi-scale visual attention and saliency modelling with decision theory

Document Type

Conference Proceeding




Faculty of Health, Engineering and Science


School of Engineering/Centre for Communications and Electronics Research




This article was originally published as: Le Ngo, A., Ang, L., Qiu, G., & Seng, K. (2013). Multi-scale visual attention and saliency modelling with decision theory. Proceedings of IEEE International Conference on Image Processing. (pp. 216-220). Melbourne, Australia. IEEE. © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Original article available here


Recently, an information-based saliency technique which is biologically plausible and computationally feasible called Discriminant Saliency (DIS) has been proposed. While DIS successfully defines discriminant saliency in the information theoretic sense, its implementation restraints the sampled features to a single fixed-size window and creates a bias towards objects with distinctive features fitted in the window size. This paper proposes a multi-scale discriminant saliency (MDIS) technique for visual attention which uses the wavelet transform for the multi-resolution framework. MDIS utilizes mutual information between classes and feature distribution to quantify classifying discriminant power as saliency value in multiple dyadic-scale structures. The paper will present simulations on Neil Bruce's image database with quantitative and qualitative results showing the advantages of MDIS over DIS. For quantitative comparisons, numerical tests AUC, NSS, LCC are measured and several plots are generated to visualized differences between simulation modes; meanwhile, qualitative evaluation is a visual examination of synthesized saliency maps of general natural scenes.