Generalized Discovery of Tight Space-Time Sequences

Authors: Antonio Castro, Heraldo Borges, Ricardo Campisano, Fabio Porto, Reza Akbarinia, Florent Masseglia, Esther Pacitti, Rafaelli Coutinho ans Eduardo Ogasawara

Abstract: Finding patterns is an important task for different domains. Spatio-temporal patterns brings knowledge about the time and position where a patter is frequent. But not all patterns are frequent over a entire dataset, some can be constrained in spatial positions and time range. Mining tight space-time sequences has as objective to discover frequent sequences, the time range and the set of positions in which these sequences are frequent.
Based on the Apriori algorithm and using concepts of ranged group, greedy-ranged group and solid-ranged group, this paper proposes STSM-2S1T algorithm as a solution to the discovery of frequent sequences that are constrained in one dimension in time and in two dimensions in space. Using a real-world spatio-temporal seismic dataset, STSM-2S1T was compared with a simple approach and extensively evaluated to analyze its sensitivity. As result, STSM-2S1T presented a better performance and low variation in resources usage as input parameters change.

Acknowledgments: The authors would like to thank CAPES, CNPq, and FAPERJ for partially funding this paper.

T401 dataset: The Netherlands seismic spatial-time series dataset, named F3 Block, was produced by the seismic reflection method in a region located in the Dutch sector of the North Sea. The seismic data is obtained by sending high-energy sound waves into the ground or seabed as the case. The amplitude of the reflected sound waves is registered, the later the reflected sound wave arrives deeper in the soil it was reflected.

The dataset is available in: dataset.RData

As a result, this dataset contains observations that are related to the time the sound wave arrives and attributes that are related to the position of the hydrophone which registered the reflected sound wave, a set of time series.
The results presented in this work were focused on public data of the inline 401.
It is composed by 951 spatial-time series with 462 observations.

Patterns previously set by experts: The location of these patterns is of key importance for oil and gas prospects.

The file that contains the positions of the patterns is available in: patterns.RData

 

STMotif-Explorer

Pattern discovery is an important task in time series mining. A particular pattern that occurs a significant number of times in a time series is called a motif. Several approaches have been developed to discover motifs in time series. However, we can observe a clear gap regarding the exploration of the spatial-time series data according to the literature review. Also, it is challenging to understand and characterize the real meaning of the motif obtained concerning the data domain, comparing different approaches and analyzing the quality of the results obtained.

We propose STMotif Explorer, a spatial-time motif analysis system that aims to interactively discover, analyze, and visualize spatial-time motifs in different domains, offering insight to users. STMotif Explorer enables users to use and implement different spatiotemporal motif detection techniques and then run this across various domains. Besides, STMotif Explorer offers the users a set of interactive resources where it is possible to visualize and analyze the discovered motifs and compare the results from different techniques. We demonstrate the features of our system with different approaches using real data.

Demonstration Video

STMotif

Spatial-Time Motifs Discovery

Authors: Heraldo Borges, Murillo Dutra,  Rafaelli~Coutinho, Fábio Perosi, Amin Bazaz,  Florent~Masseglia, Esther Pacitti, Fábio Porto, Eduardo Ogasawara
Abstract: Discovering motifs in time series data has been widely explored. Various techniques have been developed to tackle this problem. However, when it comes to spatial-time series, a clear gap can be observed according to the literature review. This paper tackles such gap by presenting an approach to discover and rank motifs in spatial-time series, denominated Combined Series Approach (CSA). CSA is based on partitioning the spatial-time series into blocks. Inside each block, subsequences of spatial-time series are combined in a way that hash-based motif discovery algorithm is applied. Motifs are validated according to both temporal and spatial constraints. Later, motifs are ranked according to their entropy, the number of occurrences, and the proximity of their occurrences. The approach was evaluated using both synthetic and seismic datasets. CSA outperforms traditional methods designed only for time series. CSA was also able to prioritize motifs that were meaningful both in the context of synthetic data and also according to seismic specialists.
Synthetic dataset:
An example with 12 spatial time series. Using a traditional approach only a single motif in ST3 is found.

CSA approach creates some combined series from all the time series, which enables the motif discovery algorithm to discover candidate motifs that explore both spatial and time properties of the time series.

The motifs discovered are mapped into the time series and checked if they are, in fact, spatial-time motifs.

Seismic dataset:
Top motifs discovered according to CSA ranking function.
Top motifs discovered according to the number of occurrences.
Code repository at Git-Hub: https://github.com/eogasawara/CSA
Acknowledgments: The authors thank CAPES, CNPq, and FAPERJ for partially sponsoring this work.