CCA: A contextual compositional approach to discover associations between health determinants and health indicators in situations of anomalie

Team: Lais Baroni (CEFET/RJ), Lucas Scoralick (CEFET/RJ), Augusto Reis (CEFET/RJ), Kele Belloze (CEFET/RJ), Marcel Pedroso (Fiocruz), Ronaldo Alves (Fiocruz), Cristiano Boccolini (Fiocruz), Patricia Boccolini (UNIFASE), Eduardo Ogasawara (CEFET/RJ)

Abstract: Epidemiology is important in public health because it studies the health-disease-care process in human populations. One of the main focuses of this science is to identify the determinants factors in the health situation of populations once it is understood that health-related anomalies are not randomly distributed among people. This understanding brings up the necessity of considering the particularities of each place and the observation of the regularity of diseases in a population context. In this work, we present \acf{cca} for the discovery of associations between \acf{hi} and \acf{hd} in situations of anomalies at the \ac{hi}. \ac{cca} uses time series concepts, anomaly detection, and data distribution between classes for studying \ac{hd} under expected conditions and comparing them to the anomalies conditions indicated by the anomaly detection in the \ac{hi}. \ac{cca} is evaluated in a neonatal mortality database in health facilities in Rio de Janeiro (RJ, Brazil). The results show that \ac{cca} can reveal important associations between the health condition and social, economic, and cultural characteristics of the population in different scales.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in https://github.com/cefet-rj-dal/cca

Harbinger

Team: Rebecca Salles (CEFET/RJ), Janio Lima (CEFET/RJ), Lais Baroni (CEFET/RJ), Antonio Castor Jr (CEFET/RJ), Leonardo Carvalho (CEFET/RJ), Heraldo Borges (CEFET/RJ), Diego Carvalho (CEFET/RJ), Rafaelli Coutinho (CEFET/RJ), Eduardo Bezerra (CEFET/RJ), Esther Pacitti (INRIA & University of Montpellier), Fabio Porto (LNCC), Eduardo Ogasawara (CEFET/RJ). 

Harbinger is a framework for event detection in time series. It provides an integrated environment for time series anomaly detection, change points, and motif discovery. It provides a broad range of event detection methods and functions for plotting and evaluating event detections.

In the anomaly part, methods are based on machine learning model deviation (Conv1D, ELM, MLP, LSTM, Random Regression Forest, SVM), machine learning classification model (Decision Tree, KNN, MLP, Naive Bayes, Random Forest, SVM), clustering (kmeans and DTW) and statistical methods (ARIMA, FBIAD, GARCH).

In the change points part, methods are based on linear regression, ARIMA, ETS, and GARCH. In the motifs part, methods are based on Hash and Matrix Profile. There are specific methods for multivariate series. The evaluation of detections includes both traditional and soft computing.

Harbinger architecture is based on Experiment Lines and is built on top of the DAL Toolbox. Such an organization makes it easy to customize and add novel methods to the framework.

The framework and examples are made available at https://cefet-rj-dal.github.io/harbinger.

 

Multi-Scale Event Detection (MSED)

 

Team: Diego Silva de Salles (CEFET/RJ), Eduardo Ogasawara (CEFET/RJ), Eduardo Bezerra (CEFET/RJ), Rafaelli Coutinho (CEFET/RJ), Carlos E. Mello (UNIRIO), Cristiane Gea (CEFET/RJ). 

Abstract: Information published in the communication media, such as government transitions, economic crises, or corruption scandals, is an external factor associated with financial time series. These factors can be related to events of increased uncertainty in the time series. External factors can have different cycles of fluctuations, affecting a time series over months or years. In particular, these external factors can raise the perceived financial risk and manifest as two main events in the time series: anomalies and change points. Discovering these events in the financial time series is challenging but can help minimize the investment risk. This paper presents Multi-Scale Event Detect (MSED), a technique for detecting events in financial time series. It compares the events found by the detection methods in the Intrinsic Mode Function (IMF) components with the external factors labels obtained through the Economic Policy Uncertainty (EPU) index. Our results identified a correlation between the uncertainty variations present in the EPU, with events detected in a financial time series. Using the proposed approach, it is possible to determine the most predominant nature of events based on the uncertainty variations presented in the EPU series. This information allows to specify a set of time series where the influence of uncertainty generates acceptable events for a certain investment profile, thus mitigating the risk in the investment to which it is intended to be exposed.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in https://github.com/cefet-rj-dal/msed

G-STSM

Authors: Antonio Castro, Heraldo Borges, Fabio Porto, Florent Masseglia, Esther Pacitti, Rafaelli Coutinho, and Eduardo Ogasawara

Abstract: Spatial-temporal sequential patterns bring knowledge about sequences of events displaced in time and space. Finding such patterns is computationally intensive but of great value for different domains. However, frequent sequential patterns discovered across an entire dataset might be less interesting than patterns discovered in constrained space and time, with local insights for domain specialists. Unfortunately, considering spatial or temporal locality involves dealing with many time/space combinations. This paper proposes and evaluates the G-STSM algorithm to discover relative frequent sequences constrained in space and time, along with the optimal constraints (the time and space locations that optimize the discovery of locally frequent patterns). It allows different sequence sizes and time-space ranges to be found. G-STSM was tested using two real-world spatial-temporal datasets from the health and seismic domains. It provides superior results compared to state-of-the-art methods.

Acknowledgments: The authors would like to thank CAPES, CNPq, and FAPERJ for partially funding this paper.

T401 dataset: The Netherlands seismic spatial-time series dataset, named F3 Block, was produced by the seismic reflection method in a region located in the Dutch sector of the North Sea. The seismic data is obtained by sending high-energy sound waves into the ground or seabed as the case. The amplitude of the reflected sound waves is registered, the later the reflected sound wave arrives deeper in the soil it was reflected.

The dataset is available in: t401.RData

As a result, this dataset contains observations that are related to the time the sound wave arrives and attributes that are related to the position of the hydrophone which registered the reflected sound wave, a set of time series.
The results presented in this work were focused on public data of the inline 401.
It is composed by 951 spatial-time series with 462 observations.

Patterns previously set by experts: The location of these patterns is of key importance for oil and gas prospects.

The file that contains the positions of the patterns is available in: horizontes

Covid-19 dataset: obtained from the Rio de Janeiro State Health Departments. It compiles daily epidemiological bulletins providing historical series of deaths by municipalities caused by Covid-19.

The dataset is available in: covid.csv

 

 

Detection of uncertainty events in the Brazilian economic and financial time series

Authors: Cristiane Gea, Luciano Vereda, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: Economic policy uncertainty shocks change how the economy behaves, moving it away from its pattern. Therefore, these effects can be understood as an event. Given this, the problem of event detection becomes particularly relevant for a more accurate understanding of how uncertainty affects the behavior of economic and financial time series. Thus, the present work aims to answer the following questions: (i) What events do economic policy uncertainty shocks cause in the economic and financial time series? (ii) What is the most suitable method for detecting such events? (iii) Does applying the ensemble methodology contribute to a more accurate detection? To answer these questions, we studied a broad range of Brazilian financial time series. The findings indicate that (i) the trend anomaly and the change point are the most prominent types of events for the Brazilian case; (ii) in most cases analyzed, the group of financial series presents the highest values observed in the metrics used to evaluate event detection methods; and (iii) the application of the ensemble methodology contributes to more accurate event detection, compared to the performance of individual methods.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially funding this research.

Experimental Evaluation:

The data and codes used to perform the experimental evaluation are available in the following zip file.

Experiment

Tutorial da solução NoSQL ArangoDB

O tutorial da solução NoSQL ArangoDB desenvolvido por Janio de Souza Lima (PPCIC) despertou interesse dos fabricantes da ferramenta. O discente foi convidado a publicar seu código (Jupyter Notebook) no repositório oficial da comunidade ArangoDB na área de tutoriais.

O código está disponível em https://github.com/arangodb/interactive_tutorials/blob/master/community_notebooks/BD_g01_ArangoDB.ipynb. Ele foi desenvolvido como parte das atividades da disciplina de Banco de Dados ministrada pelo professor Jorge Soares (PPCIC).

Janio Lima desenvolveu também conteúdos complementares em vídeo explicando o tutorial e como implementar a solução NoSQL ArangoDB através de sua API para Python. Os vídeos estão disponíveis no canal do YouTube “Python DS” do discente:

A Mixed Graph Framework to Evaluate the Complementarity of Communication Tools

Authors: Leonardo Carvalho, Eduardo Bezerra, Gustavo Guedes, Laura Assis, Leonardo Lima, Rafael Barbastefano, Artur Ziviani, Fabio Porto, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

National Laboratory for Scientific Computing (LNCC)

Abstract: Due to the constant innovations in communications tools, several organizations are constantly evaluating the adoption of new communication tools (NCT) with respect to current ones. Especially, many organizations are interested in checking if NCT is really bringing benefits in their production process. We can state an important problem that tackles this interest as for how to identify when NCT is providing a significantly different complementary communication flow with respect to the current communication tools (CCT). This paper presents the Mixed Graph Framework (MGF) to address the problem of measuring the complementarity of a NCT in the scenario where some CCT is already established. We evaluated MGF using synthetic data that represents an enterprise social network (ESN) in the context of well-established e-mail communication tool. Our experiments observed that the MGF was able to identify whether a NCT produces significant changes in the overall communications according to some centrality measures.

Acknowledgments: The authors thank CNPq, CAPES, and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The evaluation of the proposed MGF in measuring if a NCT brings complementarity to a CCT for scenarios of Small Medium Enterprises (SME). We used synthetic data to simulate both CCT and NCT usage to explore the MGF under different group configurations and enterprise scales. Both MGF and experimental evaluation is made available at https://github.com/eogasawara/mgf.

Preprint at https://peerj.com/preprints/3114v1/

Evaluating Temporal Aggregation for Predicting the Sea Surface Temperature of the Atlantic Ocean

Authors: Rebecca Salles, Patricia Mattos, Ana-Maria Dubois Iorgulescu, Eduardo Bezerra, Leonardo Lima, Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: Extreme environmental events such as droughts affect millions of people all around the world. Although it is not possible to prevent this type of event, its prediction under different time horizons enables the mitigation of eventual damages caused by its occurrence. An important variable for identifying occurrences of droughts is the sea surface temperature (SST). In the Tropical Atlantic Ocean, SST data are collected and provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) Project, which is an observation network composed of sensor buoys arranged in this region. Sensors of this type, and more generally Internet of Things (IoT) sensors, commonly lead to data losses that influence the quality of datasets collected for adjusting prediction models. In this paper, we explore the influence of temporal aggregation in predicting step-ahead SST considering different prediction horizons and different sizes for training data sets. We have conducted several experiments using data collected by PIRATA Project. Our results point out scenarios for training datasets and prediction horizons indicating whether or not temporal aggregated SST time series may be beneficial for prediction.

Acknowledgments: The authors thank CNPq and FAPERJ for partially sponsoring this research.

Experimental Evaluation:

The sea surface temperature (SST) data and R-functions used to perform the experimental evaluation are available in the following RData files. Furthermore, we present scripts for generating CSV files with the computed SST prediction errors and the p-values of their statistical analysis, respectively.

Prediction Errors Generation:

Experiment.RData

#Load required packages
library(TSPred)
library(DMwR)
library(hydroGOF)

#Perform the experiment using ARIMA
PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=FALSE,plot=FALSE)
#Perform the experiment using Random Walk
PerChangeBuoysExp(SelectedBuoys,testyears=1,RW=TRUE,plot=FALSE)

Statistical Analysis of the Prediction Errors:

ErrorsAnalysis.RData

library(nortest)

#Perform the statistical comparative analysis of each temporal aggregation approach
PredErrorsStats <- data.frame(BuoysExpStats(ErrorsBuoysPQ))[,c(1:4,8:10)]

#Perform the statistical comparative analysis of each temporal aggregation approach against daily Random Walk SST prediction
RWPredErrorsStats <- data.frame(BuoysExpStatsRW(ErrorsBuoysPQ,ErrorsBuoysRWPQ))[,c(1:4,8:10)]

Results:

  • The results of the statistical tests performed for analysis of the SST time series in the experimental dataset: BuoysStatisticalAnalysisResults
  • ARIMA models generated in the experimental evaluation: ARIMAModels
  • The 173,040 SST prediction errors generated in the experimental evaluation: SST_PredErrors
  • The normality tests result for the prediction errors generated in the experimental evaluation: NormalityTestsResults
  • The results of the statistical tests performed for analysis and comparison of the prediction errors generated in the experimental evaluation: ComparisonStatsTestsResults