G-STSM

Authors: Antonio Castro, Heraldo Borges, Fabio Porto, Florent Masseglia, Esther Pacitti, Rafaelli Coutinho, and Eduardo Ogasawara

Abstract: Spatial-temporal sequential patterns bring knowledge about sequences of events displaced in time and space. Finding such patterns is computationally intensive but of great value for different domains. However, frequent sequential patterns discovered across an entire dataset might be less interesting than patterns discovered in constrained space and time, with local insights for domain specialists. Unfortunately, considering spatial or temporal locality involves dealing with many time/space combinations. This paper proposes and evaluates the G-STSM algorithm to discover relative frequent sequences constrained in space and time, along with the optimal constraints (the time and space locations that optimize the discovery of locally frequent patterns). It allows different sequence sizes and time-space ranges to be found. G-STSM was tested using two real-world spatial-temporal datasets from the health and seismic domains. It provides superior results compared to state-of-the-art methods.

Acknowledgments: The authors would like to thank CAPES, CNPq, and FAPERJ for partially funding this paper.

T401 dataset: The Netherlands seismic spatial-time series dataset, named F3 Block, was produced by the seismic reflection method in a region located in the Dutch sector of the North Sea. The seismic data is obtained by sending high-energy sound waves into the ground or seabed as the case. The amplitude of the reflected sound waves is registered, the later the reflected sound wave arrives deeper in the soil it was reflected.

The dataset is available in: t401.RData

As a result, this dataset contains observations that are related to the time the sound wave arrives and attributes that are related to the position of the hydrophone which registered the reflected sound wave, a set of time series.
The results presented in this work were focused on public data of the inline 401.
It is composed by 951 spatial-time series with 462 observations.

Patterns previously set by experts: The location of these patterns is of key importance for oil and gas prospects.

The file that contains the positions of the patterns is available in: horizontes

Covid-19 dataset: obtained from the Rio de Janeiro State Health Departments. It compiles daily epidemiological bulletins providing historical series of deaths by municipalities caused by Covid-19.

The dataset is available in: covid.csv