CCA: A contextual compositional approach to discover associations between health determinants and health indicators in situations of anomalie

Team: Lais Baroni (CEFET/RJ), Lucas Scoralick (CEFET/RJ), Augusto Reis (CEFET/RJ), Kele Belloze (CEFET/RJ), Marcel Pedroso (Fiocruz), Ronaldo Alves (Fiocruz), Cristiano Boccolini (Fiocruz), Patricia Boccolini (UNIFASE), Eduardo Ogasawara (CEFET/RJ)

Abstract: Epidemiology is important in public health because it studies the health-disease-care process in human populations. One of the main focuses of this science is to identify the determinants factors in the health situation of populations once it is understood that health-related anomalies are not randomly distributed among people. This understanding brings up the necessity of considering the particularities of each place and the observation of the regularity of diseases in a population context. In this work, we present \acf{cca} for the discovery of associations between \acf{hi} and \acf{hd} in situations of anomalies at the \ac{hi}. \ac{cca} uses time series concepts, anomaly detection, and data distribution between classes for studying \ac{hd} under expected conditions and comparing them to the anomalies conditions indicated by the anomaly detection in the \ac{hi}. \ac{cca} is evaluated in a neonatal mortality database in health facilities in Rio de Janeiro (RJ, Brazil). The results show that \ac{cca} can reveal important associations between the health condition and social, economic, and cultural characteristics of the population in different scales.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in https://github.com/cefet-rj-dal/cca

Harbinger

Team: Rebecca Salles (CEFET/RJ), Janio Lima (CEFET/RJ), Lais Baroni (CEFET/RJ), Antonio Castor Jr (CEFET/RJ), Leonardo Carvalho (CEFET/RJ), Heraldo Borges (CEFET/RJ), Diego Carvalho (CEFET/RJ), Rafaelli Coutinho (CEFET/RJ), Eduardo Bezerra (CEFET/RJ), Esther Pacitti (INRIA & University of Montpellier), Fabio Porto (LNCC), Eduardo Ogasawara (CEFET/RJ). 

Harbinger is a framework for event detection in time series. It provides an integrated environment for time series anomaly detection, change points, and motif discovery. It provides a broad range of event detection methods and functions for plotting and evaluating event detections.

In the anomaly part, methods are based on machine learning model deviation (Conv1D, ELM, MLP, LSTM, Random Regression Forest, SVM), machine learning classification model (Decision Tree, KNN, MLP, Naive Bayes, Random Forest, SVM), clustering (kmeans and DTW) and statistical methods (ARIMA, FBIAD, GARCH).

In the change points part, methods are based on linear regression, ARIMA, ETS, and GARCH. In the motifs part, methods are based on Hash and Matrix Profile. There are specific methods for multivariate series. The evaluation of detections includes both traditional and soft computing.

Harbinger architecture is based on Experiment Lines and is built on top of the DAL Toolbox. Such an organization makes it easy to customize and add novel methods to the framework.

The framework and examples are made available at https://cefet-rj-dal.github.io/harbinger.

 

Multi-Scale Event Detection (MSED)

 

Team: Diego Silva de Salles (CEFET/RJ), Eduardo Ogasawara (CEFET/RJ), Eduardo Bezerra (CEFET/RJ), Rafaelli Coutinho (CEFET/RJ), Carlos E. Mello (UNIRIO), Cristiane Gea (CEFET/RJ). 

Abstract: Information published in the communication media, such as government transitions, economic crises, or corruption scandals, is an external factor associated with financial time series. These factors can be related to events of increased uncertainty in the time series. External factors can have different cycles of fluctuations, affecting a time series over months or years. In particular, these external factors can raise the perceived financial risk and manifest as two main events in the time series: anomalies and change points. Discovering these events in the financial time series is challenging but can help minimize the investment risk. This paper presents Multi-Scale Event Detect (MSED), a technique for detecting events in financial time series. It compares the events found by the detection methods in the Intrinsic Mode Function (IMF) components with the external factors labels obtained through the Economic Policy Uncertainty (EPU) index. Our results identified a correlation between the uncertainty variations present in the EPU, with events detected in a financial time series. Using the proposed approach, it is possible to determine the most predominant nature of events based on the uncertainty variations presented in the EPU series. This information allows to specify a set of time series where the influence of uncertainty generates acceptable events for a certain investment profile, thus mitigating the risk in the investment to which it is intended to be exposed.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in https://github.com/cefet-rj-dal/msed

Leveraging Experiment Lines to Data Analytics

Authors: Eduardo Ogasawara, Antonio Castro, Cristiane Gea, Heraldo Borges, Diego Carvalho, Joel Santos, Eduardo Bezerra, Rafaelli Coutinho

Abstract: The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity. This paper introduces the DAL Toolbox (DALT), a framework designed to address the modern challenges in data analytics workflows. DALT is inspired by Experiment Line concepts and aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyperparameter tuning and supports integration with existing libraries and languages. Overall, DALT provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries.

Page that contains information and links about the article of the same name.

Example of the paper (full version): https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples/blob/main/timeseries/ts_tune.ipynb

Video presenting the DAL Toolbox Package:

Home Page of DAL Toolbox Package: https://cefet-rj-dal.github.io/daltoolbox/

Soure code of DAL Toolbox Package (GitHub): https://github.com/cefet-rj-dal/daltoolbox

Examples: https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples/tree/main/

Time Series Prediction with Integrated Tuning

Team: Rebecca Salles (CEFET/RJ), Carla Pacheco (CEFET/RJ & PUC-Rio), Eduardo Bezerra (CEFET/RJ), Esther Pacitti (INRIA & University of Montpellier), Fabio Porto (LNCC), Eduardo Ogasawara (CEFET/RJ)

Description: The Time Series Prediction with Integrated Tuning (TSPredIT) is based on DAL Toolbox with integrated hyperparameter optimization combining machine learning and data preprocessing. It also contains time series outliers removal, data augmentation, ensemble models, and a more flexible workflow design for Data Analytics tasks.

It adopts the inherited model of the DAL Toolbox.

Code repository at Git-Hub: https://cefet-rj-dal.github.io/tspredit/

A project containing an example of the usage is obtained at: https://github.com/cefet-rj-dal-dev/tlkds

Acknowledgments: The authors thank FAPERJ, CAPES, and CNPq for partially sponsoring this work.

G-STSM

Authors: Antonio Castro, Heraldo Borges, Fabio Porto, Florent Masseglia, Esther Pacitti, Rafaelli Coutinho, and Eduardo Ogasawara

Abstract: Spatial-temporal sequential patterns bring knowledge about sequences of events displaced in time and space. Finding such patterns is computationally intensive but of great value for different domains. However, frequent sequential patterns discovered across an entire dataset might be less interesting than patterns discovered in constrained space and time, with local insights for domain specialists. Unfortunately, considering spatial or temporal locality involves dealing with many time/space combinations. This paper proposes and evaluates the G-STSM algorithm to discover relative frequent sequences constrained in space and time, along with the optimal constraints (the time and space locations that optimize the discovery of locally frequent patterns). It allows different sequence sizes and time-space ranges to be found. G-STSM was tested using two real-world spatial-temporal datasets from the health and seismic domains. It provides superior results compared to state-of-the-art methods.

Acknowledgments: The authors would like to thank CAPES, CNPq, and FAPERJ for partially funding this paper.

T401 dataset: The Netherlands seismic spatial-time series dataset, named F3 Block, was produced by the seismic reflection method in a region located in the Dutch sector of the North Sea. The seismic data is obtained by sending high-energy sound waves into the ground or seabed as the case. The amplitude of the reflected sound waves is registered, the later the reflected sound wave arrives deeper in the soil it was reflected.

The dataset is available in: t401.RData

As a result, this dataset contains observations that are related to the time the sound wave arrives and attributes that are related to the position of the hydrophone which registered the reflected sound wave, a set of time series.
The results presented in this work were focused on public data of the inline 401.
It is composed by 951 spatial-time series with 462 observations.

Patterns previously set by experts: The location of these patterns is of key importance for oil and gas prospects.

The file that contains the positions of the patterns is available in: horizontes

Covid-19 dataset: obtained from the Rio de Janeiro State Health Departments. It compiles daily epidemiological bulletins providing historical series of deaths by municipalities caused by Covid-19.

The dataset is available in: covid.csv

 

 

Detection of uncertainty events in the Brazilian economic and financial time series

Authors: Cristiane Gea, Luciano Vereda, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: Economic policy uncertainty shocks change how the economy behaves, moving it away from its pattern. Therefore, these effects can be understood as an event. Given this, the problem of event detection becomes particularly relevant for a more accurate understanding of how uncertainty affects the behavior of economic and financial time series. Thus, the present work aims to answer the following questions: (i) What events do economic policy uncertainty shocks cause in the economic and financial time series? (ii) What is the most suitable method for detecting such events? (iii) Does applying the ensemble methodology contribute to a more accurate detection? To answer these questions, we studied a broad range of Brazilian financial time series. The findings indicate that (i) the trend anomaly and the change point are the most prominent types of events for the Brazilian case; (ii) in most cases analyzed, the group of financial series presents the highest values observed in the metrics used to evaluate event detection methods; and (iii) the application of the ensemble methodology contributes to more accurate event detection, compared to the performance of individual methods.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially funding this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in the following zip file.

Experiment