Defenses – PPCIC – Programa de Pós-graduação em Ciência da Computação

Dissertation (May 11, 2026): Vanessa Santos Soares

Student: Vanessa Santos Soares

Title: Avaliação de modelos de aprendizado de máquina para a correção automática de redações segundo as competências do ENEM

Advisors: Eduardo Bezerra da Silva (advisor) and Gustavo Paiva Guedes e Silva (co-advisor)

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Gustavo Paiva Guedes e Silva (Cefet/RJ), Geraldo Bonorino Xexéo (UFRJ) and Diego Moreira de Araújo Carvalho (Cefet/RJ)

Day/Hour: May 11, 20266 / 9 a.m.

Room: https://teams.microsoft.com/meet/23323520592245?p=fnhxrL9byDTK3YLLbi

Abstract: With the growth of remote education and the implementation of large-scale exams such as ENEM, the automation of essay grading has become an increasing necessity. This work investigates different machine learning strategies for the automatic evaluation of essays written in Portuguese, based on the five assessment competencies defined by ENEM. A total of 9,599 essays were analyzed, collected from the Vestibular Brasil Escola portal, covering 102 topics published between 2009 and 2024. Two main approaches are compared: (i) traditional methods based on TF-IDF and linguistically engineered features extracted from the texts, and (ii) pre-trained language models with fine-tuning (XLM-RoBERTa with LoRA). Model performance is evaluated using the Quadratic Weighted Kappa (QWK) metric, which measures agreement with human raters. The study aims to demonstrate that pre-trained models provide significant improvements in robustness and reliability, outperforming feature-engineering-based approaches. This research contributes to the advancement of Automatic Essay Scoring (AES) in Portuguese by offering a benchmark and comparative analysis that can support future studies and educational applications.

Dissertation (May 12, 2026): Gustavo Melo

Student: Gustavo Melo

Title: Reconhecimento de Entidades Nomeadas em Relatos Criminais Informais com Apoio de Metadados Estruturados

Andvisors: Eduardo Bezerra da Silva and Karla Figueiredo

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Karla Figueiredo (UERJ), Gustavo Paiva Guedes (Cefet/RJ), Kele Teixeira Belloze (Cefet/RJ) and Ronaldo Ribeiro Goldschmidt (IME/RJ)

Day/Hour: May 12, 2026 / 9 a.m

Room: https://teams.microsoft.com/l/meetup-join/19%3ameeting_OWQ3ZTg5YzEtZjRmZC00NjkwLTg0MWUtYTBhODdkYTQwYTA0%40thread.v2/0?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%22049f2b6a-7ad4-4096-a3ed-db846537c488%22%7d

Abstract: This work investigates the problem of named entity recognition in informal crime reports recorded by the Disque Denúncia service. These reports, often marked by colloquial language, spelling mistakes, and free text structure, pose significant challenges to the use of traditional Natural Language Processing models. In addition to free text, the reports are accompanied by structured metadata, such as type of occurrence, location, and date which can provide additional relevant context for the task. In this study, we propose an approach based on the fine-tuning of large language models, using a manually annotated corpus with entities of the types Person, Location, and Organization. To overcome the scarcity of labeled data, the methodology includes the application of pseudo-labeling on a second, significantly larger corpus, thus expanding the training base in a semi-supervised manner. Additionally, the metadata from the reports are incorporated as a source of context both in preprocessing and in the evaluation and refinement processes of the models. The experiments were conducted with the GliNER model and evaluate that the use of metadata and pseudo-labeling can contribute to improving model performance on informal corpora, with positive impacts on automated information extraction in public security contexts. The results reinforce the potential of hybrid and domain-sensitive approaches for real-world Natural Language Processing applications in environments with scarce labeled data.

Dissertation (March 18, 2026): Nathália Carvalho Tito

Student: Nathália Carvalho Tito

Title: Análise de Desempenho e Características de Corredores para Predição de Resultados e Geração de Feedbacks Personalizados

Advisors: Glauco Fiorott Amorim (Advisor) and Eduardo Bezerra da Silva (Co-advisor)

Committee: Glauco Fiorott Amorim (Cefet/RJ), Eduardo Bezerra da Silva (Cefet/RJ), Diego Nunes Brandão (Cefet/RJ) and Cláudio Miceli de Farias (COPPE/UFRJ).

Day/Hour: March 18, 2026 / 8 a.m.

Room: https://teams.microsoft.com/meet/22404162370488?p=2x3lfHhn8JsNjYtQEL

Abstract: The increasing number of recreational runners has intensified the demand for solutions capable of providing individualized training support, especially among amateur athletes who often lack continuous professional guidance. In this context, this study proposes an integrated performance analysis model based on machine learning techniques, aiming to understand training patterns, predict runners’ performance, and generate personalized recommendations based on controllable variables. Data were obtained from a questionnaire and activity records exported from a running application, involving 26 athletes with different levels of experience, allowing a multidimensional view of training habits and sports experience. The methodology was structured in three main phases. In the first phase, a clustering analysis was performed using classical algorithms K-means, DBSCAN, and Agglomerative Clustering applied to principal components explaining 80\% of the data variability. The selected model identified three distinct profiles: “Young Experienced,” “Less Experienced,” and “Veteran Experienced,” differentiated by age, sports maturity, and training patterns. In the second phase, a predictive model based on gradient boosting was developed using the XGBoost algorithm, both in a general configuration and in versions specific to each cluster. Linear regression models were also tested as a reference approach; however, XGBoost achieved superior performance across all evaluated scenarios, demonstrating a greater ability to capture nonlinear relationships and complex interactions among variables. The results also indicated that each group responded differently to training variables, reinforcing that segmentation substantially improves predictive performance and the adequacy of personalized recommendations. Model interpretability was investigated using SHAP values, which enabled the identification of the most influential variables in the predictions. In general, variables directly related to training structure and performance stood out, such as minimum and maximum speed, distance covered, pace variability, and recent performance history, as well as contextual factors such as temperature. The segmented analysis revealed distinct patterns across groups, indicating that different runner profiles present specific sensitivities to aspects such as intensity, regularity, and training consistency, further supporting the importance of personalized recommendations. In the third phase, an optimization algorithm was applied to identify combinations of controllable variables capable of maximizing the predicted performance for each profile. The evaluation of results was based on comparisons with baseline scenarios, in which decision variables were not adjusted, allowing objective quantification of the gains achieved through optimization. The resulting recommendations showed internal coherence and alignment with cluster characteristics: greater emphasis on intensity and training variety for young experienced runners; focus on regularity, consistency, and strength training for less experienced runners; and strategies centered on maintenance, balance, and load control for veteran runners. These findings demonstrate that the integration of clustering, predictive modeling, and optimization provides a consistent and promising approach for developing data-driven intelligent sports recommendation systems. Despite limitations related to sample size and the absence of more granular physiological indicators, the study provides initial evidence that computational models can effectively support personalized training in an efficient, accessible, and scalable manner. Future research may expand the dataset, incorporate additional informational dimensions, validate the model in larger populations, and explore its implementation in digital platforms. In conclusion, the combination of data science techniques and optimization methods significantly contributes to understanding running performance and to developing individualized recommendations that promote improvement, adherence, and safety in sports practice.

Dissertation (February 18, 2026): Matheus dos Santos Moura

Student: Matheus dos Santos Moura

Title: Hybrid Anomaly and Change Point Detection for Pump-and-Dump Schemes in Centralized Cryptocurrency Exchanges

Advisors: Diogo Silveira Mendonça

Committee: Diogo Silveira Mendonça (Cefet/RJ), Eduardo Soares Ogasawara (Cefet/RJ) and Igor Machado Coelho (UFF)

Day/Hour: February 18, 2026 / 3 p.m.

Room: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZGFkNGM1MzMtOGMzNi00OWU5LTkzYjUtY2JhNGQxZmQzZjBl%40thread.v2/0?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%226821740b-ed93-4582-b3a3-b3bfbff6624e%22%7d

Abstract: The rapid growth of cryptocurrency markets has intensified concerns regarding market manipulation practices, particularly pump-and-dump schemes. Detecting such schemes remains challenging due to the high volatility of cryptocurrencies and the limited availability of reliable ground-truth data. Prior work has predominantly relied on anomaly detection techniques, which often exhibit limited precision and adaptability. In this work, we propose two offline statistical methods that explore a hybrid framework combining anomaly detection and change point detection for pump-and-dump detection. The first method, HD Pump, integrates volatility anomaly detection in price time series with change point detection applied to trading volume. The second method, HD Pump Plus, extends this approach by replacing the price time series with a rush-order-based time series. Experimental evaluation on a dataset of 178 confirmed pump-and-dump events from the Binance exchange shows that HD Pump Plus outperforms prior statistical approaches, achieving a precision of 96.4%, recall of 89.3%, and F1-score of 92.7%. These results demonstrate the effectiveness of hybrid detection strategies in advancing the state of the art while maintaining methodological simplicity.

Dissertation (January 21, 2026): Edson Paulo da Silva Pinto Sobrinho

Student: Edson Paulo da Silva Pinto Sobrinho

Title: Fine-tuning detection criteria to enhance anomaly identification in time series

Advisors: Eduardo Soares Ogasawara (advisor) and Kele Teixeira Belloze (co-advisor)

Committee: Eduardo Soares Ogasawara (CEFET/RJ), Kele Teixeira Belloze (CEFET/RJ), Rafaelli de Carvalho Coutinho (CEFET/RJ) and Esther Pacitti (INRIA / University of Montpellier)

Day/Hour: January 10, 2026 / 10:30 a.m.

Room: https://teams.microsoft.com/meet/24688731415374?p=YVHyDYwIW66rCysKq0

Abstract: Anomaly Detection (AD) is the problem of identifying observations that do not conform to typical ones in a time series. Detection methods implicitly define detection criteria, such as deviation measures, filter thresholds, and candidate anomaly selection strategies. Choosing inappropriate criteria results in inaccurate outputs, generating spurious alerts or missing events. Adjusting these criteria is essential for monitoring systems. To address this challenge, this study explores the fine-tuning of deviation measures, filter thresholds, and candidate selection strategies. Experimental results show that the proper choice of criteria significantly improves AD performance, often with greater impact than changing the detection methods.

Dissertation (January 08, 2026): Luiz Cláudio Lemos de Oliveira

Student: Luiz Cláudio Lemos de Oliveira

Title: Motif Detection in Time Series Using Autoencoders: An Analysis of Their Application to ECG Data

Advisor: Eduardo Soares Ogasawara

Committee: Eduardo Soares Ogasawara (CEFET/RJ), Laura Silva de Assis (CEFET/RJ), Helga Dolorico Balbi (CEFET/RJ) and Rebecca Pontes Salles (INRIA/FRA)

Day/Hour: January 08, 2026 / 10 a.m.

Room: https://teams.microsoft.com/meet/2899124353229?p=ykqgR5NeTJfPfXOhSF

Abstract: The discovery of motifs in biomedical time series, such as electrocardiograms (ECGs), involves identifying recurrent patterns that may contain valuable diagnostic information. Traditional methods, such as SAX, are limited by strong statistical assumptions, which are particularly inadequate for complex physiological signals. In parallel, autoencoders have demonstrated superior ability to learn nonlinear representations, but their application to motif discovery in ECG data remains unexplored, constituting a significant methodological gap. This work proposes a framework that replaces SAX discretization with neural encoding while preserving the discovery pipeline based on Shannon’s entropy and frequency of occurrence. The methodology was developed in three stages: (i) validation of the autoencoder’s reconstruction ability, (ii) training models with data from the MIT-BIH Arrhythmia Database, and (iii) systematic experimental comparison with the traditional SAX method through detection experiments, parametric sensitivity analysis, and evaluation of generalization capacity. It is concluded that replacing traditional discretization with neural encoding is feasible and provides quantitative and qualitative gains in motif discovery in ECG signals, establishing a methodological basis for developing automated biomedical signal analysis tools.

Dissertation (December 18, 2025): Michel Siqueira Reis

Student: Michel Siqueira Reis

Title: Matching Detections to Events in Time Series with Computational Efficiency and Guaranteed Optimality

Advisors: Rafaelli Coutinho (Advisor) and Eduardo Ogasawara (Co-advisor)

Committee: Rafaelli Coutinho (Cefet/RJ), Eduardo Ogasawara (Cefet/RJ), Laura Assis (Cefet/RJ) and Rebecca Salles (INRIA)

Day/Hour: December 18, 2025 / 1 p.m.

Room: https://teams.microsoft.com/l/meetup-join/19%3ae20c8697654543fc9dd1e9924de5c2c0%40thread.tacv2/1763159776096?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%2254af42a0-5f30-4905-ac8d-10b96c6db26b%22%7d

Abstract: This work presents SmartSoftED, an optimized metric for evaluating the detection of point events in time series. The original metric, SoftED, introduces a “soft” evaluation based on temporal tolerance, assigning gradual scores to detections that occur near actual events. However, its current formulation relies on a greedy approach that does not guarantee optimality in all cases and incurs a quadratic computational cost, limiting its applicability in large-scale or real-time processing environments. SmartSoftED overcomes these limitations by introducing a strategy that decomposes the problem into manageable, disjoint subproblems: some can be solved efficiently without loss of optimality, while others are modeled as maximum-weighted matching problems on unbalanced bipartite graphs. This approach preserves the optimality of correspondences between detections and events while significantly reducing computational cost. In practice, the method achieves an average speedup of two orders of magnitude, making it suitable for certain large-scale applications and systems with strict temporal constraints.

Dissertation (December 8, 2025): Fernando Henrique de Jesus Fraga da Silva

Student: Fernando Henrique de Jesus Fraga da Silva

Title: Aprendizado por Reforço Profundo Aplicado à Negociação Intradiária de Múltiplas Ações

Advisors: Eduardo Bezerra da Silva (advisor) and Pedro Henrique González Silva (co-advisor)

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Pedro Henrique González Silva (UFRJ), Aline Marins Paes Carvalho (UFF) e Glauco Fiorott Amorim (Cefet/RJ)

Day/Hour: December 8, 2025 / 3 p.m.

Room: https://teams.microsoft.com/v2/?meetingjoin=true#/l/meetup-join/19:PKOJTuK7mfHSDE6QkCWQCYp71f0xOMNoRgSUj4wjMKc1@thread.tacv2/1763760050816?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%22c03d6068-4733-48a6-bbb4-aa78f351d9cf%22%7d&anon=true&deeplinkId=91733be2-9804-4f09-ac6a-f1a362e67de8

Abstract: The stock market is a dynamic and volatile environment in which publicly traded companies negotiate fractions of their value, subject to continuous price fluctuations influenced by economic, political, and social factors. Anticipating these fluctuations is a complex task, especially in the context of intraday trading, where buy and sell decisions must be made within very short time intervals based on rapidly changing data. In this scenario, Reinforcement Learning (RL) emerges as a promising paradigm capable of developing adaptive strategies through the continuous interaction between agent and environment. This dissertation investigates the use of Deep Reinforcement Learning (DRL) techniques in financial trading, focusing on intraday scenarios involving multiple stocks. It proposes a DRL-based approach to estimate buy and sell actions simultaneously across various assets, using high-granularity market data to better approximate real trading conditions. Experimental analyses were conducted using the Proximal Policy Optimization (PPO) algorithm. The results indicate that the proposed agent outperformed traditional benchmark strategies, achieving gains exceeding 10 percentage points in certain cases.

Category: Defenses