Rafaelli Coutinho – PPCIC – Programa de Pós-graduação em Ciência da Computação

Dissertation (May 15, 2026): Edson Landim de Almeida

Student: Edson Landim de Almeida

Title: Estimation of Soil Organic Carbon Using Quantile Approaches and Explainable Artificial Intelligence

Advisors: Diego Brandão (advisor) and Jorge Soares (co-advisor)

Committee Diego Brandão (Cefet/RJ), Jorge Soares (Cefet/RJ), Kele Belloze (Cefet/RJ) and Marcos Ceddia (UFRRJ)

Day/hour: May 15, 2026/ 9 a.m.

Room: https://teams.microsoft.com/meet/272684253086940?p=V1Gq8L6jg3r6sPZIuq

Abstract: Soil organic carbon (SOC) plays a central role in biogeochemical cycles, soil fertility, and climate change mitigation, constituting one of the largest carbon reservoirs in the terrestrial system. Although direct quantification of SOC using laboratory methods is accurate, it is costly and time-consuming and limits sampling density in large-scale surveys. In this context, statistical and machine learning techniques have been used to estimate carbon content from more easily obtained edaphic and environmental attributes. This study investigates the estimation of soil organic carbon using pedological data provided by the Brazilian Agricultural Research Corporation (Embrapa), integrating multiple linear regression, Random Forest, Quantile Regression Forests (QRF), and Explainable Artificial Intelligence techniques based on SHAP. Multiple linear regression was used as an interpretable baseline, while Random Forest was employed to capture nonlinear relationships and interactions among soil attributes. The quantile-based random forest approach enabled the analysis of different regions of the conditional distribution of SOC, considering the 0.1, 0.5, and 0.9 quantiles. Results showed that SOC estimation is characterized by high variability, asymmetry, and heteroscedasticity, with relevant differences among the evaluated land uses. The conventional Random Forest model captured general trends in the data but exhibited greater error dispersion in the higher-carbon ranges. The QRF model expanded the analysis by enabling the representation of lower and upper bounds on the response, providing a more informative interpretation of the uncertainty associated with the predictions. The SHAP analysis indicated the relevance of attributes related to soil color, depth, and texture, in agreement with pedological knowledge. Overall, the results indicate that integrating tree-based models, quantile approaches, and explainability techniques constitutes a promising strategy for estimating and interpreting soil organic carbon, thereby advancing methodologies for digital soil mapping and environmental monitoring.

Dissertation (May 13, 2026): Ana Gabriela Viana de Araújo

Student: Ana Gabriela Viana de Araújo

Title: Application of concept drift-based methods for predicting goals in professional football

Advisor: Jorge de Abreu Soares

Committee: Jorge de Abreu Soares (Cefet/RJ), Glauco Fiorott Amorim (Cefet/RJ), Pedro Henrique González Silva (UFRJ), Carlos Eduardo Ribeiro de Mello (Unirio)

Day/Hour: May 13, 2026 / 9 p.m.

Room: https://teams.microsoft.com/meet/235075217075845?p=ezgpJ8dNRKnMa8dTfM

Abstract: This work investigates the application of concept drift detection techniques for the early identification of goals in soccer matches, based on intra-match event data. The approach treats the problem as monitoring changes in the intra-match pass distribution, using operationally virtual drift, that is, detection based exclusively on P(X) without labels in real time, with the premise that these changes precede changes in the probability of a goal. The robustness of the results is verified by temporal division with 190 training matches and 190 test matches. Data from the 2015/2016 La Liga season were used: 380 matches, aggregated in one-minute intervals, with analysis of both offensive and defensive behavior. Three drift detectors were evaluated (Page-Hinkley, KSWIN, and ADWIN) in comparison with deterministic and stochastic baselines, using moving averages of pass frequency as input signal. The evaluation adopts an asymmetric variant of the SoftED evaluation, which penalizes late alarms through a decreasing linear scoring function in the [t-K,t] window, with K=10 minutes. The results indicate that Page-Hinkley obtained the highest MCC among the detectors evaluated, surpassing both baselines; Page-Hinkley and KSWIN presented equivalent F1 values, with a marginal advantage for KSWIN. Comparison with supervised approaches from the literature shows that the proposed method, although simpler and without the need for labeled data, achieves competitive performance from the first match. Limitations of the approach are discussed, including the use of passes as the only proxy signal and the restriction to a single season, as well as perspectives for future work with multivariate variables and longitudinal analysis.

Dissertation (May 11, 2026): Daiane de Ascenção Cardoso

Student: Daiane de Ascenção Cardoso

Title: Method for Assessing Accessibility in Biomedical Ontologies Visualizations

Advisors: Kele Teixeira Belloze (advisor) and Felipe da Rocha Henriques (co-advisor)

BCommittee: Kele Teixeira Belloze (Cefet/RJ), Felipe da Rocha Henriques (Cefet/RJ), Ingrid Monteiro (UFC), Glauco Amorim (Cefet/RJ) and Luis Carlos dos Santos Coutinho Retondaro (Cefet/RJ)

Day/Hour: May 11, 2026 / 1:30 p.m.

Room: https://teams.microsoft.com/meet/218701639098481?p=ZeSHhNfSekBnHMS8F8

Abstract: Biomedical ontologies play a fundamental role in structuring and communicating scientific knowledge, yet representing them accessibly remains a challenge. This dissertation proposes a systematic and automatable method for evaluating the accessibility of ontology visualizations. A qualitative analysis through methodological triangulation (usability, communicability, and accessibility) revealed the absence of objective criteria in existing tools. Subsequently, 3,048 parameterized visualizations of the Hemoglobin class from the Sickle Cell Disease Ontology (SCDO) were generated under three contrast configurations. A sample was manually labeled using the PCHRD model (Perceivable, Comprehensible, Hierarchically clear, Reliably labeled, and Distinguishable), adapted from the Chartability framework, revealing a relevant combinatorial asymmetry: there are more ways for a visualization to be inaccessible than accessible. A Random Forest was applied in two iterations — a probe of accessibility prevalence (confirming the labeling finding) and a proof of concept with mPCHRD-driven sample curation —, indicating that visual attributes carry discriminative signal. To verify real-world applicability, a corpus of images was built from articles in the Journal of Biomedical Semantics, classified by multimodal models (Claude Haiku 4.5 and GPT-4o-mini) and manually evaluated. Finally, the Chain-of-Thought technique was adopted to inspect the models’ reasoning when applying the PCHRD criteria, comparing automatic classifications with human judgments. The results indicate that the method predicts accessibility levels from graphical features and that explicit reasoning makes the models more restrictive in identifying perceptual barriers, potentially supporting inclusive design practices in biomedical ontology visualization. This work offers a replicable protocol, formalizes PCHRD as an instrument for assessment and retrieval, and fills a gap in the quantitative measurement of accessibility in information visualization.

Dissertation (May 06, 2026): Jorge Nelson de Souza Pavão

Student: Jorge Nelson de Souza Pavão

Title: Large Language Models for Detecting Collusion in Bidding Processes

Advisors: Kele Teixeira Belloze (Advisor) and Diego Nunes Brandão (Co-advisor)

Committee: Kele Teixeira Belloze (CEFET/RJ), Diego Nunes Brandão (CEFET/RJ), Raissa dos Santos Barcellos (UERJ), Eduardo Bezerra da Silva (CEFET/RJ)

Day/Hour: May 06, 2026 / 4:30 p.m

Room: Sala E-520 (Bloco E, 5º andar, Cefet/Maracanã)

Abstract: The practice of collusion in public procurement, characterized by illicit agreements between competing firms, generates significant economic losses, reduces the efficiency of public contracting, and undermines trust in institutions. Several studies have employed machine learning algorithms and statistical variables to identify indications of this type of fraud; however, challenges such as the scarcity of labeled data, difficulties in generalization, and inconsistent performance across different datasets still persist. In this context, this dissertation investigates the potential of Large Language Models (LLMs) for detecting indications of collusion based on numerical data from procurement processes. Models from different providers were evaluated using two strategies: prompt engineering and supervised fine-tuning. The experiments were conducted on three real-world datasets — Brazil, the United States, and Switzerland — and the results were compared with those of traditional algorithms reported in the literature. The findings show that prompt engineering did not produce satisfactory performance. With fine-tuning, the LLMs outperformed traditional algorithms in only one of the three evaluated scenarios. Considering performance, cost, and implementation complexity jointly, it is concluded that, at the present time, traditional algorithms offer a better cost-benefit ratio for the evaluated task, while LLMs may serve as a complementary alternative in specific scenarios where classical methods fail to achieve satisfactory results or in situations involving textual data.

Dissertation (May 04, 2026): Balthazar da Silva Cunha Paixão

Student: Balthazar da Silva Cunha Paixão

Title: Analisando a Robustez de Redes de Passes em Futebol usando Redes Complexas

Advisor: Glauco Fiorott Amorim

Committee: Glauco Fiorott Amorim (PPCIC-Cefet/RJ), Pedro Henrique Gonzalez Silva (PPCIC-Cefet/RJ) and Claudio Miceli de Farias (COPPE/UFRJ)

Day/Hour: May 04, 2026 / 7 a.m.

Room: https://teams.microsoft.com/meet/258717362035218?p=8XGFKontHejSUpL3Ou

Alternative link: https://meet.google.com/icf-miyw-pzi

Abstract: This work investigates the relationship between the structure of passing networks in football and the competitive success of teams over a season. Starting from the hypothesis that structural properties of these networks are associated with competitive success, operationalized by final league standing, football is modeled as a complex system of collective interactions. Passing networks are represented as directed and weighted graphs, in which nodes correspond to players and edges represent passes exchanged between them. The IHG metric is proposed, grounded in the interquartile range of structural importance measures, designed to quantify the heterogeneity in players’ participation within the passing network. The empirical analysis considers five major European leagues — La Liga, Premier League, Ligue~1, Serie~A, and Bundesliga — in the 2015/16 season. For each team and match, passing networks are constructed in their baseline state, from which classical structural metrics are computed, followed by the application of progressive perturbations through node and edge removal until a minimum structural threshold is reached. The data suggest that IHG constitutes a stable indicator for associating network organization with competitive success. The results indicate that multiple topological properties are associated with teams’ final standings, although with different patterns across leagues. In particular, metrics related to global connectivity, structural cohesion, and reciprocity show stronger associations in leagues such as Serie~A and Bundesliga. In contrast, the Premier League exhibits lower sensitivity to traditional global metrics, with IHG emerging as the measure most consistently associated with performance, a result interpreted in light of the atypical nature of the 2015/16 season. The analysis under perturbation reveals that structural robustness is not uniformly distributed across metrics, as only a subset — particularly those associated with structural cohesion and, to a lesser extent, transport efficiency — preserves discriminative capacity after network degradation. The findings support the research hypothesis, indicating that competitive success does not depend on a single structural property, but rather on a set of topological characteristics whose relevance varies according to the competitive context. Finally, this work establishes an integrated framework for structural and robustness analysis in passing networks, enabling the evaluation not only of teams’ collective organization but also of their ability to preserve functional properties under perturbations.

Dissertation (May 11, 2026): Vanessa Santos Soares

Student: Vanessa Santos Soares

Title: Avaliação de modelos de aprendizado de máquina para a correção automática de redações segundo as competências do ENEM

Advisors: Eduardo Bezerra da Silva (advisor) and Gustavo Paiva Guedes e Silva (co-advisor)

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Gustavo Paiva Guedes e Silva (Cefet/RJ), Geraldo Bonorino Xexéo (UFRJ) and Diego Moreira de Araújo Carvalho (Cefet/RJ)

Day/Hour: May 11, 20266 / 9 a.m.

Room: https://teams.microsoft.com/meet/23323520592245?p=fnhxrL9byDTK3YLLbi

Abstract: With the growth of remote education and the implementation of large-scale exams such as ENEM, the automation of essay grading has become an increasing necessity. This work investigates different machine learning strategies for the automatic evaluation of essays written in Portuguese, based on the five assessment competencies defined by ENEM. A total of 9,599 essays were analyzed, collected from the Vestibular Brasil Escola portal, covering 102 topics published between 2009 and 2024. Two main approaches are compared: (i) traditional methods based on TF-IDF and linguistically engineered features extracted from the texts, and (ii) pre-trained language models with fine-tuning (XLM-RoBERTa with LoRA). Model performance is evaluated using the Quadratic Weighted Kappa (QWK) metric, which measures agreement with human raters. The study aims to demonstrate that pre-trained models provide significant improvements in robustness and reliability, outperforming feature-engineering-based approaches. This research contributes to the advancement of Automatic Essay Scoring (AES) in Portuguese by offering a benchmark and comparative analysis that can support future studies and educational applications.

Dissertation (May 12, 2026): Gustavo Melo

Student: Gustavo Melo

Title: Reconhecimento de Entidades Nomeadas em Relatos Criminais Informais com Apoio de Metadados Estruturados

Andvisors: Eduardo Bezerra da Silva and Karla Figueiredo

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Karla Figueiredo (UERJ), Gustavo Paiva Guedes (Cefet/RJ), Kele Teixeira Belloze (Cefet/RJ) and Ronaldo Ribeiro Goldschmidt (IME/RJ)

Day/Hour: May 12, 2026 / 9 a.m

Room: https://teams.microsoft.com/l/meetup-join/19%3ameeting_OWQ3ZTg5YzEtZjRmZC00NjkwLTg0MWUtYTBhODdkYTQwYTA0%40thread.v2/0?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%22049f2b6a-7ad4-4096-a3ed-db846537c488%22%7d

Abstract: This work investigates the problem of named entity recognition in informal crime reports recorded by the Disque Denúncia service. These reports, often marked by colloquial language, spelling mistakes, and free text structure, pose significant challenges to the use of traditional Natural Language Processing models. In addition to free text, the reports are accompanied by structured metadata, such as type of occurrence, location, and date which can provide additional relevant context for the task. In this study, we propose an approach based on the fine-tuning of large language models, using a manually annotated corpus with entities of the types Person, Location, and Organization. To overcome the scarcity of labeled data, the methodology includes the application of pseudo-labeling on a second, significantly larger corpus, thus expanding the training base in a semi-supervised manner. Additionally, the metadata from the reports are incorporated as a source of context both in preprocessing and in the evaluation and refinement processes of the models. The experiments were conducted with the GliNER model and evaluate that the use of metadata and pseudo-labeling can contribute to improving model performance on informal corpora, with positive impacts on automated information extraction in public security contexts. The results reinforce the potential of hybrid and domain-sensitive approaches for real-world Natural Language Processing applications in environments with scarce labeled data.

Dissertation (March 18, 2026): Nathália Carvalho Tito

Student: Nathália Carvalho Tito

Title: Análise de Desempenho e Características de Corredores para Predição de Resultados e Geração de Feedbacks Personalizados

Advisors: Glauco Fiorott Amorim (Advisor) and Eduardo Bezerra da Silva (Co-advisor)

Committee: Glauco Fiorott Amorim (Cefet/RJ), Eduardo Bezerra da Silva (Cefet/RJ), Diego Nunes Brandão (Cefet/RJ) and Cláudio Miceli de Farias (COPPE/UFRJ).

Day/Hour: March 18, 2026 / 8 a.m.

Room: https://teams.microsoft.com/meet/22404162370488?p=2x3lfHhn8JsNjYtQEL

Abstract: The increasing number of recreational runners has intensified the demand for solutions capable of providing individualized training support, especially among amateur athletes who often lack continuous professional guidance. In this context, this study proposes an integrated performance analysis model based on machine learning techniques, aiming to understand training patterns, predict runners’ performance, and generate personalized recommendations based on controllable variables. Data were obtained from a questionnaire and activity records exported from a running application, involving 26 athletes with different levels of experience, allowing a multidimensional view of training habits and sports experience. The methodology was structured in three main phases. In the first phase, a clustering analysis was performed using classical algorithms K-means, DBSCAN, and Agglomerative Clustering applied to principal components explaining 80\% of the data variability. The selected model identified three distinct profiles: “Young Experienced,” “Less Experienced,” and “Veteran Experienced,” differentiated by age, sports maturity, and training patterns. In the second phase, a predictive model based on gradient boosting was developed using the XGBoost algorithm, both in a general configuration and in versions specific to each cluster. Linear regression models were also tested as a reference approach; however, XGBoost achieved superior performance across all evaluated scenarios, demonstrating a greater ability to capture nonlinear relationships and complex interactions among variables. The results also indicated that each group responded differently to training variables, reinforcing that segmentation substantially improves predictive performance and the adequacy of personalized recommendations. Model interpretability was investigated using SHAP values, which enabled the identification of the most influential variables in the predictions. In general, variables directly related to training structure and performance stood out, such as minimum and maximum speed, distance covered, pace variability, and recent performance history, as well as contextual factors such as temperature. The segmented analysis revealed distinct patterns across groups, indicating that different runner profiles present specific sensitivities to aspects such as intensity, regularity, and training consistency, further supporting the importance of personalized recommendations. In the third phase, an optimization algorithm was applied to identify combinations of controllable variables capable of maximizing the predicted performance for each profile. The evaluation of results was based on comparisons with baseline scenarios, in which decision variables were not adjusted, allowing objective quantification of the gains achieved through optimization. The resulting recommendations showed internal coherence and alignment with cluster characteristics: greater emphasis on intensity and training variety for young experienced runners; focus on regularity, consistency, and strength training for less experienced runners; and strategies centered on maintenance, balance, and load control for veteran runners. These findings demonstrate that the integration of clustering, predictive modeling, and optimization provides a consistent and promising approach for developing data-driven intelligent sports recommendation systems. Despite limitations related to sample size and the absence of more granular physiological indicators, the study provides initial evidence that computational models can effectively support personalized training in an efficient, accessible, and scalable manner. Future research may expand the dataset, incorporate additional informational dimensions, validate the model in larger populations, and explore its implementation in digital platforms. In conclusion, the combination of data science techniques and optimization methods significantly contributes to understanding running performance and to developing individualized recommendations that promote improvement, adherence, and safety in sports practice.

Dissertation (February 18, 2026): Matheus dos Santos Moura

Student: Matheus dos Santos Moura

Title: Hybrid Anomaly and Change Point Detection for Pump-and-Dump Schemes in Centralized Cryptocurrency Exchanges

Advisors: Diogo Silveira Mendonça

Committee: Diogo Silveira Mendonça (Cefet/RJ), Eduardo Soares Ogasawara (Cefet/RJ) and Igor Machado Coelho (UFF)

Day/Hour: February 18, 2026 / 3 p.m.

Room: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZGFkNGM1MzMtOGMzNi00OWU5LTkzYjUtY2JhNGQxZmQzZjBl%40thread.v2/0?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%226821740b-ed93-4582-b3a3-b3bfbff6624e%22%7d

Abstract: The rapid growth of cryptocurrency markets has intensified concerns regarding market manipulation practices, particularly pump-and-dump schemes. Detecting such schemes remains challenging due to the high volatility of cryptocurrencies and the limited availability of reliable ground-truth data. Prior work has predominantly relied on anomaly detection techniques, which often exhibit limited precision and adaptability. In this work, we propose two offline statistical methods that explore a hybrid framework combining anomaly detection and change point detection for pump-and-dump detection. The first method, HD Pump, integrates volatility anomaly detection in price time series with change point detection applied to trading volume. The second method, HD Pump Plus, extends this approach by replacing the price time series with a rush-order-based time series. Experimental evaluation on a dataset of 178 confirmed pump-and-dump events from the Binance exchange shows that HD Pump Plus outperforms prior statistical approaches, achieving a precision of 96.4%, recall of 89.3%, and F1-score of 92.7%. These results demonstrate the effectiveness of hybrid detection strategies in advancing the state of the art while maintaining methodological simplicity.

Dissertation (January 21, 2026): Edson Paulo da Silva Pinto Sobrinho

Student: Edson Paulo da Silva Pinto Sobrinho

Title: Fine-tuning detection criteria to enhance anomaly identification in time series

Advisors: Eduardo Soares Ogasawara (advisor) and Kele Teixeira Belloze (co-advisor)

Committee: Eduardo Soares Ogasawara (CEFET/RJ), Kele Teixeira Belloze (CEFET/RJ), Rafaelli de Carvalho Coutinho (CEFET/RJ) and Esther Pacitti (INRIA / University of Montpellier)

Day/Hour: January 10, 2026 / 10:30 a.m.

Room: https://teams.microsoft.com/meet/24688731415374?p=YVHyDYwIW66rCysKq0

Abstract: Anomaly Detection (AD) is the problem of identifying observations that do not conform to typical ones in a time series. Detection methods implicitly define detection criteria, such as deviation measures, filter thresholds, and candidate anomaly selection strategies. Choosing inappropriate criteria results in inaccurate outputs, generating spurious alerts or missing events. Adjusting these criteria is essential for monitoring systems. To address this challenge, this study explores the fine-tuning of deviation measures, filter thresholds, and candidate selection strategies. Experimental results show that the proper choice of criteria significantly improves AD performance, often with greater impact than changing the detection methods.

Dissertation (January 08, 2026): Luiz Cláudio Lemos de Oliveira

Student: Luiz Cláudio Lemos de Oliveira

Title: Motif Detection in Time Series Using Autoencoders: An Analysis of Their Application to ECG Data

Advisor: Eduardo Soares Ogasawara

Committee: Eduardo Soares Ogasawara (CEFET/RJ), Laura Silva de Assis (CEFET/RJ), Helga Dolorico Balbi (CEFET/RJ) and Rebecca Pontes Salles (INRIA/FRA)

Day/Hour: January 08, 2026 / 10 a.m.

Room: https://teams.microsoft.com/meet/2899124353229?p=ykqgR5NeTJfPfXOhSF

Abstract: The discovery of motifs in biomedical time series, such as electrocardiograms (ECGs), involves identifying recurrent patterns that may contain valuable diagnostic information. Traditional methods, such as SAX, are limited by strong statistical assumptions, which are particularly inadequate for complex physiological signals. In parallel, autoencoders have demonstrated superior ability to learn nonlinear representations, but their application to motif discovery in ECG data remains unexplored, constituting a significant methodological gap. This work proposes a framework that replaces SAX discretization with neural encoding while preserving the discovery pipeline based on Shannon’s entropy and frequency of occurrence. The methodology was developed in three stages: (i) validation of the autoencoder’s reconstruction ability, (ii) training models with data from the MIT-BIH Arrhythmia Database, and (iii) systematic experimental comparison with the traditional SAX method through detection experiments, parametric sensitivity analysis, and evaluation of generalization capacity. It is concluded that replacing traditional discretization with neural encoding is feasible and provides quantitative and qualitative gains in motif discovery in ECG signals, establishing a methodological basis for developing automated biomedical signal analysis tools.

Dissertation (December 18, 2025): Michel Siqueira Reis

Student: Michel Siqueira Reis

Title: Matching Detections to Events in Time Series with Computational Efficiency and Guaranteed Optimality

Advisors: Rafaelli Coutinho (Advisor) and Eduardo Ogasawara (Co-advisor)

Committee: Rafaelli Coutinho (Cefet/RJ), Eduardo Ogasawara (Cefet/RJ), Laura Assis (Cefet/RJ) and Rebecca Salles (INRIA)

Day/Hour: December 18, 2025 / 1 p.m.

Room: https://teams.microsoft.com/l/meetup-join/19%3ae20c8697654543fc9dd1e9924de5c2c0%40thread.tacv2/1763159776096?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%2254af42a0-5f30-4905-ac8d-10b96c6db26b%22%7d

Abstract: This work presents SmartSoftED, an optimized metric for evaluating the detection of point events in time series. The original metric, SoftED, introduces a “soft” evaluation based on temporal tolerance, assigning gradual scores to detections that occur near actual events. However, its current formulation relies on a greedy approach that does not guarantee optimality in all cases and incurs a quadratic computational cost, limiting its applicability in large-scale or real-time processing environments. SmartSoftED overcomes these limitations by introducing a strategy that decomposes the problem into manageable, disjoint subproblems: some can be solved efficiently without loss of optimality, while others are modeled as maximum-weighted matching problems on unbalanced bipartite graphs. This approach preserves the optimality of correspondences between detections and events while significantly reducing computational cost. In practice, the method achieves an average speedup of two orders of magnitude, making it suitable for certain large-scale applications and systems with strict temporal constraints.

Dissertation (December 8, 2025): Fernando Henrique de Jesus Fraga da Silva

Student: Fernando Henrique de Jesus Fraga da Silva

Title: Aprendizado por Reforço Profundo Aplicado à Negociação Intradiária de Múltiplas Ações

Advisors: Eduardo Bezerra da Silva (advisor) and Pedro Henrique González Silva (co-advisor)

Committee: Eduardo Bezerra da Silva (Cefet/RJ), Pedro Henrique González Silva (UFRJ), Aline Marins Paes Carvalho (UFF) e Glauco Fiorott Amorim (Cefet/RJ)

Day/Hour: December 8, 2025 / 3 p.m.

Room: https://teams.microsoft.com/v2/?meetingjoin=true#/l/meetup-join/19:PKOJTuK7mfHSDE6QkCWQCYp71f0xOMNoRgSUj4wjMKc1@thread.tacv2/1763760050816?context=%7b%22Tid%22%3a%228eeca404-a47d-4555-a2d4-0f3619041c9c%22%2c%22Oid%22%3a%22c03d6068-4733-48a6-bbb4-aa78f351d9cf%22%7d&anon=true&deeplinkId=91733be2-9804-4f09-ac6a-f1a362e67de8

Abstract: The stock market is a dynamic and volatile environment in which publicly traded companies negotiate fractions of their value, subject to continuous price fluctuations influenced by economic, political, and social factors. Anticipating these fluctuations is a complex task, especially in the context of intraday trading, where buy and sell decisions must be made within very short time intervals based on rapidly changing data. In this scenario, Reinforcement Learning (RL) emerges as a promising paradigm capable of developing adaptive strategies through the continuous interaction between agent and environment. This dissertation investigates the use of Deep Reinforcement Learning (DRL) techniques in financial trading, focusing on intraday scenarios involving multiple stocks. It proposes a DRL-based approach to estimate buy and sell actions simultaneously across various assets, using high-granularity market data to better approximate real trading conditions. Experimental analyses were conducted using the Proximal Policy Optimization (PPO) algorithm. The results indicate that the proposed agent outperformed traditional benchmark strategies, achieving gains exceeding 10 percentage points in certain cases.

Algorithms and Graph Based Models

The field of Graph Theory studies the relationships between elements, called nodes, and their connections, known as edges. This area encompasses models ranging from technological networks to social and air transportation networks. Its main subfields include Network Science, which analyzes interactions in complex systems, and Computer Networks, which provide the technological infrastructure for global communication.

Network Science investigates how the structure and dynamics of connections influence the global behavior of a network. Topics such as centrality, robustness, and structural patterns are analyzed to better understand social, economic, and biological networks. The growth of technology and the explosion of data in recent decades have further increased the relevance of this field.

In Computer Networks, defining the network topology is essential for efficient monitoring. This process can be modeled as an optimization problem or analyzed as a Complex Network, using graph-theoretic concepts to study its properties and performance. Moreover, infrastructure management and data communication rely on specific protocols tailored to different applications, such as environmental monitoring, mobile networks, and biomedical systems. The efficiency of these protocols is evaluated using metrics such as packet delivery rate, network throughput, and energy consumption.

This project aims to develop graph-based applications across various domains, combining computational simulation with practical experiments. It also seeks to improve the design and communication within these graph structures, exploring new protocols to make information transmission more efficient and resilient.

Faculty Members Involved:

Diego Nunes Brandão (coordinator)
Felipe da Rocha Henriques
Glauco Fiorott Amorim
Helga Dolorico Balbi
Laura Silva de Assis

Smart Applications

Intelligent Applications have become essential for optimizing processes and enabling informed decision-making. Their integration with Robotics, Multimedia, and the Internet of Things (IoT) drives significant innovation across multiple domains.

In Robotics, intelligent applications enhance machine autonomy and interaction, enabling solutions that range from personal assistant robots to advanced surgical systems. A special focus is placed on educational robotics, which combines state-of-the-art technology with playful, interactive approaches to develop intelligent embedded systems and perception algorithms. These solutions are often tested in technology competitions to refine their performance before being applied in educational contexts.

Multimedia has transformed the way information is consumed by integrating video, audio, images, and text with intelligent algorithms. This enables personalized user experiences, speech and image recognition, and immersive virtual reality environments, resulting in more intuitive and multisensory interactions.

In IoT, Artificial Intelligence allows everyday objects to collect and analyze data to create more efficient and secure environments. The convergence of IoT and AI gives rise to AIoT (Artificial Intelligence of Things), which incorporates advanced learning and decision-making capabilities into connected devices.

This research project explores how these technologies can transform teaching and learning, synchronize multisensory effects, and support environmental monitoring, enabling the development of more autonomous and efficient systems.

Faculty Members Involved:

Joel Andre Ferreira dos Santos (coordinator)
João Roberto de Toledo Quadros
Glauco Fiorott Amorim
Diego Nunes Brandão

Data Analysis

Data Analysis is a multidisciplinary field focused on interpreting large volumes of information to support decision-making, strategy development, and innovation. Statistical and machine learning techniques are employed to identify patterns and forecast future events, encompassing structured, semi-structured, and unstructured data.

For structured data, the main challenges involve analyzing time series and spatiotemporal data, including prediction, pattern discovery, and adaptation to data drift. Methods such as filtering and decomposition are used to build robust models for forecasting. The detection of events in time series, such as anomalies and regime changes, is relevant for both retrospective and real-time analysis.

When dealing with semi-structured and unstructured data, challenges include text mining and natural language processing (NLP). Text mining aims to uncover patterns and trends through statistical learning and text vectorization, supporting applications such as sentiment analysis and affective computing, which studies emotions in texts and human interactions. In this project, text mining is closely linked to affective computing and behavioral analysis, also encompassing image and video processing.

Behavioral analysis examines individuals within social networks, using graph-based models to identify communities and understand interaction dynamics. Applications include targeted marketing and information diffusion, providing insights into collective and emotional patterns within human interactions.

Faculty Members Involved:

Eduardo Soares Ogasawara (coordinator)
Eduardo Bezerra da Silva
Gustavo Paiva Guedes e Silva
Jorge de Abreu Soares
Kele Teixeira Belloze

Software Engineering

Software Engineering is the field that studies and applies scientific and technological methods to the software life cycle, ensuring systematic and disciplined approaches to development. With the growing reliance on software in smartphones, computers, and wearable devices, the quality and security of these systems have become fundamental. Furthermore, emerging technologies such as Artificial Intelligence, the Internet of Things (IoT), Blockchain, and Virtual Reality impose new challenges on software engineering.

This research project investigates how software engineering can be applied to these technologies to maximize their societal benefits. In the context of Blockchain, for instance, smart contracts enable innovative services, but code vulnerabilities can lead to million-dollar losses, making security a critical concern. In IoT, security is equally essential, as failures can compromise hardware or even endanger human lives. Developing secure, scalable, and reliable systems thus becomes a central challenge within Software Engineering.

Educational games are another important application, supporting learning through exploration within the game environment. The use of data provenance makes it possible to analyze player actions, revealing their behavior and strategies.

This project also welcomes additional investigations into emerging technologies and their societal impact, exploring innovative approaches to software development.

Faculty Members Involved:

Diogo Silveira Mendonça (coordinator)
Joel André Ferreira dos Santos

Machine Learning and Optimization

Machine Learning (ML) is a branch of Artificial Intelligence dedicated to developing new algorithms and methodologies capable of identifying patterns and making decisions without explicit programming. Beyond practical applications, progress in this field depends on creating novel theoretical and computational approaches that enhance the efficiency, interpretability, and generalization capacity of models.

This research project investigates advanced ML methods, spanning traditional techniques, such as deep neural networks and probabilistic models, to emerging approaches including self-supervised learning, generative models, federated learning, and reinforcement learning. Additionally, the project aims to improve strategies for explainability and interpretability to make models more transparent and trustworthy, especially in critical applications.

A second fundamental pillar of this project is Optimization, a field that integrates with ML to improve model performance and solve complex problems across different domains. The project focuses on the design and application of methods for solving problems using linear, nonlinear, integer, and mixed-integer programming (through exact and/or heuristic methods), as well as bio-inspired metaheuristics such as ant colony optimization, genetic algorithms, and particle swarm optimization. Optimization techniques are applied to tasks such as tuning machine learning model parameters, feature selection, and neural network architecture design.

Finally, Affective Computing explores how ML algorithms can interpret, process, and respond to human emotional states. This includes investigating new methods for fusing physiological and emotional signals. The goal is to advance the development of systems capable of adapting their responses in more natural and empathetic ways, with applications ranging from conversational interfaces to interactive robotics.

Faculty Members Involved:

Eduardo Bezerra da Silva (coordinator)
Gustavo Paiva Guedes e Silva
Diogo Silveira Mendonça
Diego Moreira de Araújo Carvalho
Laura Silva de Assis

Database Management and Administration

The growing volume of data requires organizations to develop strategies for extracting valuable insights and gaining competitive advantage. This process involves the collection, storage, integration, and analysis of structured, semi-structured, and unstructured data. The research investigates methodologies for managing and transforming these data into useful knowledge to support decision-making.

The focus lies on data-centric artificial intelligence (Data-Centric AI) for data preparation and on large-scale processing techniques. One of the challenges addressed is the parallel and distributed processing of massive volumes of heterogeneous data, common in fields such as bioinformatics, astronomy, and engineering. Scientific workflows are essential for these experiments and are frequently executed on clusters, supercomputers, and cloud environments.

The project also explores frameworks such as Apache Spark, optimizing workflows for large-scale data analysis and management. In addition, it investigates conceptual modeling techniques, ontologies, preprocessing, indexing, and querying in Big Data systems. The research considers approaches based on distributed storage (HDFS), NoSQL databases, NewSQL systems, and object-relational databases, aiming to enhance the efficiency of data handling and analysis.

Faculty Members Involved:

Rafaelli de Carvalho Coutinho (coordinator)
Eduardo Soares Ogasawara
Diego Moreira de Araújo Carvalho
Jorge de Abreu Soares
Kele Teixeira Belloze

Author: Rafaelli Coutinho