Authors: Rebecca Salles, Patricia Mattos, Ana-Maria Dubois Iorgulescu, Eduardo Bezerra, Leonardo Lima, Eduardo Ogasawara
Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)
Abstract: Extreme environmental events such as droughts affect millions of people all around the world. Although it is not possible to prevent this type of event, its prediction under different time horizons enables the mitigation of eventual damages caused by its occurrence. An important variable for identifying occurrences of droughts is the sea surface temperature (SST). In the Tropical Atlantic Ocean, SST data are collected and provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) Project, which is an observation network composed of sensor buoys arranged in this region. Sensors of this type, and more generally Internet of Things (IoT) sensors, commonly lead to data losses that influence the quality of datasets collected for adjusting prediction models. In this paper, we explore the influence of temporal aggregation in predicting step-ahead SST considering different prediction horizons and different sizes for training data sets. We have conducted several experiments using data collected by PIRATA Project. Our results point out scenarios for training datasets and prediction horizons indicating whether or not temporal aggregated SST time series may be beneficial for prediction.
Acknowledgments: The authors thank CNPq and FAPERJ for partially sponsoring this research.
Experimental Evaluation:
The sea surface temperature (SST) data and R-functions used to perform the experimental evaluation are available in the following RData files. Furthermore, we present scripts for generating CSV files with the computed SST prediction errors and the p-values of their statistical analysis, respectively.
Prediction Errors Generation:
#Load required packages library("TSPred") library("DMwR") library("hydroGOF") #Perform the experiment using ARIMA invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=FALSE,plot=FALSE),file = "PredErrors_ARIMA.csv"))) #Perform the experiment using Random Walk invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=TRUE,plot=FALSE),file = "PredErrors_RW.csv")))
Statistical Analysis of the Prediction Errors:
library("nortest") #Perform the statistical comparative analysis of each temporal aggregation approach PredErrorsStats <- data.frame(BuoysExpStats(ErrorsBuoysPQ))[,c(1:4,8:10)] write.csv(PredErrorsStats,file = "PredErrorsStats.csv") #Perform the statistical comparative analysis of each temporal aggregation approach against daily Random Walk SST prediction RWPredErrorsStats <- data.frame(BuoysExpStatsRW(ErrorsBuoysPQ,ErrorsBuoysRWPQ))[,c(1:4,8:10)] write.csv(RWPredErrorsStats,file = "RWPredErrorsStats.csv")
Results:
- The results of the statistical tests performed for analysis of the SST time series in the experimental dataset: BuoysStatisticalAnalysisResults
- ARIMA models generated in the experimental evaluation: ARIMAModels
- The 173,040 SST prediction errors generated in the experimental evaluation: SST_PredErrors
- The normality tests results for the prediction errors generated in the experimental evaluation: NormalityTestsResults
- The results of the statistical tests performed for analysis and comparison of the prediction errors generated in the experimental evaluation: ComparisonStatsTestsResults