Rebecca Salles

Contact:

rebeccapsalles@acm.org

Rio de Janeiro, Brazil

Languages:

About

PhD student, data scientist and researcher with focus on predictive analysis, data mining, data pre-processing, time series and event detection. Grantee of CAPES scholarship and ACM member.

PhD in Production Engineering and Systems (CEFET/RJ)
Advisor: Eduardo Ogasawara, Co-advisor: Fabio Porto
Ongoing
(2023)
Master of Computer Science (CEFET/RJ)2019
Bachelor of Computer Science (CEFET/RJ)2016
Technician in Industrial Informatics (CEFET/RJ)2010

Publications

Artefacts

tspred

TSPred

TSPred Package for R (version 4.0): Functions for Benchmarking Time Series Prediction. 2018.

plot_harbinger

Harbinger

Framework for integration and analysis of time series events detection methods

Undergoing research

Harbinger: Framework for integration and analysis of time series events detection methods​
Estimation of COVID 19 under reporting in Brazilian States through SARI
Dataset of Monthly Historical Record of Neonatal Mortality Rates in Brazilian Municipalities
Ensemble classifier method for detection of events in time series
Araima: Autoregressive Adaptively Integrated Moving Average
Exploration of online event detection methods
TSPred : An R Package for Time Series Prediction
Hyperparameter Optimazation with Espectral Clustering

Research projects participation

The Big Data phenomenon has been produced by sciences, companies and governments. It presents itself as one of the great challenges for the current knowledge society. The need for knowledge extraction in the context of Data Science grows significantly, relying on the exploration of Data Mining methods for pattern prediction, classification and discovery. Several of the studied phenomena correspond to non-stationary environments, often associated with time and space. Such a property makes Data Mining much more complex. This research project aims to increase the efficiency and effectiveness of the approaches in these environments, focusing on both management and data analysis. Therefore, it is intended to work on three fronts of work: (i) data management, (ii) methods for discovering patterns, (iii) methods of prediction. The research combines the applied study of space-time series with basic research in management and data analysis in order to understand in depth the circumstances under which such approaches can be refined to support non-stationarity.

Institution: Federal Center of Technological Education Celso Suckow da Fonseca (CEFET/RJ)
Coordinator: Eduardo Soares Ogasawara (CEFET/RJ)

The adoption of prediction models by the oil industry reflects its applicability in several areas, including the detection and prediction of flaws in the equipment and processes involved in the construction of offshore wells. In this context, different learning-based prediction techniques, such as deep neural networks, use observed data from the phenomena of interest as a basis for model training. Model training involves choosing the training, validation and testing set. Different models can be obtained by varying the training data, including observation periods and even observed quantities. In this way, companies quickly find themselves involved with an increasing number of information assets represented by: prediction models; training, testing and validation data; data from the training process; etc. This project aims to investigate the management of data and models involved in predictive processes, focusing on the detection and prediction of failures in the equipment and processes involved in the construction of marine wells. Drawing a parallel with database management systems, where the focus is on ensuring efficient access and data sharing by a corporation, the present project intends to produce a prototype of a system that stores, publishes, executes and shares assets involved in the prediction process, in addition to providing a framework of techniques and algorithms that allow the analysis of this data, providing information and predictions regarding ongoing processes.

Institution: Leopoldo Américo Miguez de Mello Research and Development Center (CENPES/PETROBRÁS)
Coordinator: Fabio Andre Machado Porto (National Laboratory for Scientific Computing – LNCC)

Research related to analysis, monitoring, event prediction (cases), health and disease situations in the population, as well as their association with their socio-environmental determinants. Storage, management and analysis of large amounts of data for researchers, teachers and students of teaching and research institutions.

Institution: Oswaldo Cruz Foundation (Fiocruz – RJ)
Coordinator: Marcel de Moraes Pedroso (Fiocruz)

Mixed epidemiological studies, with an ecological approach (maternity units as units of analysis), individual (neonates), and ‘big data’, which will use data from the National Information Systems (SINASC, SIM, SIH and others) to evaluate all 62,950,321 live births in Brazil, from 1996 to 2016. These data will be correlated, in a historical series, with the implementation of hospital breastfeeding policies (Baby Friendly Hospital Initiative, Kangaroo Mother Method or Human Milk Bank) in all Brazilian hospitals, assessing the impact of one or more of these three initiatives on neonatal morbidity and mortality. The project is supported by the Grand Challenges Explorations – Brazil: Data Science Approaches to Improve Maternal and Child Health in Brazil (2018), by the Bill and Melinda Gates Foundation.

Institution: Oswaldo Cruz Foundation (Fiocruz – RJ)
Coordinator: Cristiano Siqueira Boccolini (Fiocruz)

Analysis of dominant and influencing themes through the automatic capture of pandemic mentions in digital media and social networks using Natural Language Processing (NLP) algorithms.

Use of data science and artificial intelligence techniques, through Natural Language Processing (NLP) algorithms, for massive and automated search for mentions (posts, news, comments, etc.) about the COVID-19 pandemic in digital media and social networks for analysis of dominant and influential themes. The proposal’s motivation is to offer Fiocruz employees, involved in the Contingency Plan, an open, robust and flexible tool that allows the construction of analysis panels through thematic filters (of interest to employees) on the more than 11 million mentions collected from Instagram, Facebook, Twitter, YouTube, among others; in 100 thousand blogs; and in 1,000 of the main national news portals.

Institution: Oswaldo Cruz Foundation (Fiocruz – RJ)
Coordinator: Marcel de Moraes Pedroso (Fiocruz)

Awards

2019Master Honors: Mention of Honor in Scientific Production, CEFET/RJ.
2019Best paper (category: short, vision, industry), Brazilian Database Symposium (SBBD).
2017Third place in the 36º Competition of Scientific Initiation Works, Congress of the Brazilian Computer Society (CSBC).
2016Graduation Honors: Summa Cum Laude, CEFET/RJ.
2010Future Generation Project, Government of the State of Rio de Janeiro.
2009Future Generation Project, Government of the State of Rio de Janeiro.

© Rebecca Salles August, 2020.