A statistical reasoning scheme for geochemical data mining and automatic anomaly identification and classification

Document Type

Journal Article




Computing, Health and Science


School of Computer and Information Science




This article was originally published as: Guo, W. (2005). A Statistical Reasoning Scheme for Geochemical Data Mining and Automatic Anomaly Identification and Classification. WSEAS Transactions on Computers, 4(11), 1619-1626. Original available here


Geochemical data processing aims to not only reduce the random and/or systematic errors resulted from the field survey and/or laboratory analysis, but also identify whether the data contain useful information indicating the existence of mineral concentrations, oil fields, and pollution sources in the survey area. The first task is usually achieved by using various smoothing approaches. However, how to determine the ‘best’ outcome from using many smoothing methods is still qualitative. The second task is made by comparing the data to some geochemical benchmarks. In this paper, a statistical reasoning scheme is proposed to determine the likely ‘best’ outcome among many smoothed datasets, and then this ‘best’ fitted dataset is used to determine anomalies in reference to different geochemical benchmarks. The proposed statistical selector quantifies the determination of smoothing for geochemical data. The anomaly classifiers proposed can identify and classify the potential geochemical anomalies contained in the data as background anomaly (BA), threshold anomaly (TA), reliable anomaly (RA), and local anomaly (LA) automatically.