Evaluating Temporal Aggregation for Predicting the Sea Surface Temperature of the Atlantic Ocean

Authors: Rebecca Salles, Patricia Mattos, Ana-Maria Dubois Iorgulescu, Eduardo Bezerra, Leonardo Lima, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: Extreme environmental events such as droughts affect millions of people all around the world. Although it is not possible to prevent this type of event, its prediction under different time horizons enables the mitigation of eventual damages caused by its occurrence. An important variable for identifying occurrences of droughts is the sea surface temperature (SST). In the Tropical Atlantic Ocean, SST data are collected and provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) Project, which is an observation network composed of sensor buoys arranged in this region. Sensors of this type, and more generally Internet of Things (IoT) sensors, commonly lead to data losses that influence the quality of datasets collected for adjusting prediction models. In this paper, we explore the influence of temporal aggregation in predicting step-ahead SST considering different prediction horizons and different sizes for training data sets. We have conducted several experiments using data collected by PIRATA Project. Our results point out scenarios for training datasets and prediction horizons indicating whether or not temporal aggregated SST time series may be beneficial for prediction.

Acknowledgments: The authors thank CNPq and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The sea surface temperature (SST) data and R-functions used to perform the experimental evaluation are available in the following RData files. Furthermore, we present scripts for generating CSV files with the computed SST prediction errors and the p-values of their statistical analysis, respectively.

Prediction Errors Generation:

Experiment.RData

#Load required packages
library("TSPred")
library("DMwR")
library("hydroGOF")

#Perform the experiment using ARIMA
invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=FALSE,plot=FALSE),file = "PredErrors_ARIMA.csv")))
#Perform the experiment using Random Walk
invisible(capture.output(write.csv(PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=TRUE,plot=FALSE),file = "PredErrors_RW.csv")))

Statistical Analysis of the Prediction Errors:

ErrorsAnalysis.RData

library("nortest")

#Perform the statistical comparative analysis of each temporal aggregation approach
PredErrorsStats <- data.frame(BuoysExpStats(ErrorsBuoysPQ))[,c(1:4,8:10)]
write.csv(PredErrorsStats,file = "PredErrorsStats.csv")

#Perform the statistical comparative analysis of each temporal aggregation approach against daily Random Walk SST prediction
RWPredErrorsStats <- data.frame(BuoysExpStatsRW(ErrorsBuoysPQ,ErrorsBuoysRWPQ))[,c(1:4,8:10)]
write.csv(RWPredErrorsStats,file = "RWPredErrorsStats.csv")

Results:

  • The results of the statistical tests performed for analysis of the SST time series in the experimental dataset: BuoysStatisticalAnalysisResults
  • ARIMA models generated in the experimental evaluation: ARIMAModels
  • The 173,040 SST prediction errors generated in the experimental evaluation: SST_PredErrors
  • The normality tests results for the prediction errors generated in the experimental evaluation: NormalityTestsResults
  • The results of the statistical tests performed for analysis and comparison of the prediction errors generated in the experimental evaluation: ComparisonStatsTestsResults

 

 

 

 

 

 

Eduardo Ogasawara

Eduardo Ogasawara has been a professor at the Department of Computer Science at the Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) since 2010. He holds a D.Sc. in Systems and Computer Engineering from COPPE/UFRJ. Between 2000 and 2007, he worked in the Information Technology (IT) sector, gaining extensive experience in workflows and project management. With a strong background in Data Science, he is currently focused on Data Mining and Time Series Analysis. He is a member of IEEE, ACM, and SBC. Throughout his career, he has authored numerous published articles and led projects funded by agencies such as CNPq and FAPERJ. Currently, he heads the Data Analytics Lab (DAL) at CEFET/RJ, where he continues to advance research in Data Science.