Evaluating Temporal Aggregation for Predicting the Sea Surface Temperature of the Atlantic Ocean

Authors: Rebecca Salles, Patricia Mattos, Ana-Maria Dubois Iorgulescu, Eduardo Bezerra, Leonardo Lima, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: Extreme environmental events such as droughts affect millions of people all around the world. Although it is not possible to prevent this type of event, its prediction under different time horizons enables the mitigation of eventual damages caused by its occurrence. An important variable for identifying occurrences of droughts is the sea surface temperature (SST). In the Tropical Atlantic Ocean, SST data are collected and provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) Project, which is an observation network composed of sensor buoys arranged in this region. Sensors of this type, and more generally Internet of Things (IoT) sensors, commonly lead to data losses that influence the quality of datasets collected for adjusting prediction models. In this paper, we explore the influence of temporal aggregation in predicting step-ahead SST considering different prediction horizons and different sizes for training data sets. We have conducted several experiments using data collected by PIRATA Project. Our results point out scenarios for training datasets and prediction horizons indicating whether or not temporal aggregated SST time series may be beneficial for prediction.

Acknowledgments: The authors thank CNPq and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The sea surface temperature (SST) data and R-functions used to perform the experimental evaluation are available in the following RData files. Furthermore, we present scripts for generating CSV files with the computed SST prediction errors and the p-values of their statistical analysis, respectively.

Prediction Errors Generation:

Experiment.RData

#Load required packages
library("TSPred")
library("DMwR")
library("hydroGOF")

#Perform the experiment using ARIMA
invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=FALSE,plot=FALSE),file = "PredErrors_ARIMA.csv")))
#Perform the experiment using Random Walk
invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=TRUE,plot=FALSE),file = "PredErrors_RW.csv")))

Statistical Analysis of the Prediction Errors:

ErrorsAnalysis.RData

library("nortest")

#Perform the statistical comparative analysis of each temporal aggregation approach
PredErrorsStats <- data.frame(BuoysExpStats(ErrorsBuoysPQ))[,c(1:4,8:10)]
write.csv(PredErrorsStats,file = "PredErrorsStats.csv")

#Perform the statistical comparative analysis of each temporal aggregation approach against daily Random Walk SST prediction
RWPredErrorsStats <- data.frame(BuoysExpStatsRW(ErrorsBuoysPQ,ErrorsBuoysRWPQ))[,c(1:4,8:10)]
write.csv(RWPredErrorsStats,file = "RWPredErrorsStats.csv")

Results:

  • The results of the statistical tests performed for analysis of the SST time series in the experimental dataset: BuoysStatisticalAnalysisResults
  • ARIMA models generated in the experimental evaluation: ARIMAModels
  • The 173,040 SST prediction errors generated in the experimental evaluation: SST_PredErrors
  • The normality tests results for the prediction errors generated in the experimental evaluation: NormalityTestsResults
  • The results of the statistical tests performed for analysis and comparison of the prediction errors generated in the experimental evaluation: ComparisonStatsTestsResults

 

 

 

 

 

 

Eduardo Ogasawara

I am a Professor of the Computer Science Department of the Federal Center for Technological Education of Rio de Janeiro (CEFET / RJ) since 2010. I hold a PhD in Systems Engineering and Computer Science at COPPE / UFRJ. Between 2000 and 2007 I worked in the Information Technology (IT) field where I acquired extensive experience in workflows and project management. I have solid background in the Databases and my primary interest is Data Science. He currently studies space-time series, parallel and distributed processing, and data preprocessing methods. I am a member of the IEEE, ACM, INNS, and SBC. Throughout my career I have been presenting consistent number of published articles and projects approved by the funding agencies, such as CNPq and FAPERJ. I am also reviewer of several international journals, such as VLDB Journal, IEEE Transactions on Service Computing and The Journal of Systems and Software. Currently, I am heading the Post-Graduate Program in Computer Science (PPCIC) of CEFET / RJ.