Author Identifier

Ebenezer Afrifa-Yamoah ORCID: https://orcid.org/0000-0003-1741-9249

Aiden Fisher ORCID: https://orcid.org/0000-0002-5826-6800

Ute Mueller ORCID: https://orcid.org/0000-0002-8670-2120

Document Type

Journal Article

Publication Title

Meteorological Applications

ISSN

13504827

Volume

27

Issue

1

Publisher

Wiley

School

School of Science

RAS ID

32019

Comments

Afrifa‐Yamoah, E., Mueller, U. A., Taylor, S. M., & Fisher, A. J. (2020). Missing data imputation of high‐resolution temporal climate time series data. Meteorological Applications, 27(1), e1873. https://doi.org/10.1002/met.1873

Abstract

© 2020 The Authors. Meteorological Applications published by John Wiley & Sons Ltd on behalf of the Royal Meteorological Society. Analysis of high-resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Research on the imputation of high-resolution temporal climate time series data is still at an early phase. In this study, multiple approaches to the imputation of missing values were evaluated, including a structural time series model with Kalman smoothing, an autoregressive integrated moving average (ARIMA) model with Kalman smoothing and multiple linear regression. The methods were applied to complete subsets of data from 12 month time series of hourly temperature, humidity and wind speed data from four locations along the coast of Western Australia. Assuming that observations were missing at random, artificial gaps of missing observations were studied using a five-fold cross-validation methodology with the proportion of missing data set to 10%. The techniques were compared using the pooled mean absolute error, root mean square error and symmetric mean absolute percentage error. The multiple linear regression model was generally the best model based on the pooled performance indicators, followed by the ARIMA with Kalman smoothing. However, the low error values obtained from each of the approaches suggested that the models competed closely and imputed highly plausible values. To some extent, the performance of the models varied among locations. It can be concluded that the modelling approaches studied have demonstrated suitability in imputing missing data in hourly temperature, humidity and wind speed data and are therefore recommended for application in other fields where high-resolution data with missing values are common.

DOI

10.1002/met.1873

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

 
COinS