Imputation methods on daily PM10 data (2010-15)

Air pollution monitoring especially PM10 pollutant is very important since the air pollutant data originated from the continuous ambient air quality stations (CAAQS) usually had missing data due to the machine failure, routine maintenance and human error. In view of this fact, a study of PM10 imputa...

Full description

Bibliographic Details
Main Authors: Abd Rani, Nurul Latiffah, Azid, Azman, Yunus, Kamaruzzaman
Format: Article
Language:English
Published: Innovative Scientific Information & Services Network 2019
Subjects:
Online Access:http://irep.iium.edu.my/76209/
http://irep.iium.edu.my/76209/
http://irep.iium.edu.my/76209/1/Prof%20K-2.pdf
Description
Summary:Air pollution monitoring especially PM10 pollutant is very important since the air pollutant data originated from the continuous ambient air quality stations (CAAQS) usually had missing data due to the machine failure, routine maintenance and human error. In view of this fact, a study of PM10 imputation method was performed with the objective to determine the coefficient of determination (R2) and root mean square error (RMSE) in order to portray the goodness of fit for all of the imputation methods used (mean substitution, nearest neighbour and expectation maximization based algorithm (EMB)). The results of R2 obtained for 5%, 10%, 15%, 25% and 40% proportion of missing data using nearest neighbor imputation methods are 0.9318, 0.8126, 0.6546, 0.5458 and 0.3946, while RMSE are 7.47, 12.27, 16.68, 19.13 and 21.76, respectively. Meanwhile, results of R2 obtained for 5%, 10%, 15%, 25% and 40% proportion of missing data using mean imputation methods are 0.9274, 0.8117, 0.6484, 0.5400 and 0.3910, while RMSE are 7.47, 12.36, 16.90, 19.13 and 22.07, respectively. In the meantime, the results of R2 for EMB imputation method applied at 5%, 10%, 15%, 25% and 40% proportion of missing data are 0.9084, 0.8468, 0.7530, 0.5791 and 0.5004, while RMSE are 8.58, 11.18, 14.20, 18.53 and 20.48, respectively. A measure of performances (R2 and RMSE) for each imputation methods decreased and increase respectively as the percentages of simulated missing data increases