Dissertation (March 11, 2026): Nathália Carvalho Tito

Student: Nathália Carvalho Tito

Title: Análise de Desempenho e Características de Corredores para Predição de Resultados e Geração de Feedbacks Personalizados

Advisors: Glauco Fiorott Amorim (Advisor) and Eduardo Bezerra da Silva (Co-advisor)

Committee: Glauco Fiorott Amorim (Cefet/RJ), Eduardo Bezerra da Silva (Cefet/RJ),  Diego Nunes Brandão (Cefet/RJ) and Pedro Henrique Gonzalez Silva (COPPE/UFRJ).

Day/Hour: March 11, 2026 / 10 a.m.

Room: https://teams.microsoft.com/meet/22404162370488?p=2x3lfHhn8JsNjYtQEL

Abstract: “The increasing number of recreational runners has intensified the demand for solutions capable of providing individualized training support, especially among amateur athletes who often lack continuous professional guidance. In this context, this study proposes an integrated performance analysis model based on machine learning techniques, aiming to understand training patterns, predict runners’ performance, and generate personalized recommendations based on controllable variables. Data were obtained from a questionnaire and activity records exported from a running application, involving 26 athletes with different levels of experience, allowing a multidimensional view of training habits and sports experience. The methodology was structured in three main phases. In the first phase, a clustering analysis was performed using classical algorithms K-means, DBSCAN, and Agglomerative Clustering applied to principal components explaining 80\% of the data variability. The selected model identified three distinct profiles: “Young Experienced,” “Less Experienced,” and “Veteran Experienced,” differentiated by age, sports maturity, and training patterns. In the second phase, a predictive model based on gradient boosting was developed using the XGBoost algorithm, both in a general configuration and in versions specific to each cluster. Linear regression models were also tested as a reference approach; however, XGBoost achieved superior performance across all evaluated scenarios, demonstrating a greater ability to capture nonlinear relationships and complex interactions among variables. The results also indicated that each group responded differently to training variables, reinforcing that segmentation substantially improves predictive performance and the adequacy of personalized recommendations. Model interpretability was investigated using SHAP values, which enabled the identification of the most influential variables in the predictions. In general, variables directly related to training structure and performance stood out, such as minimum and maximum speed, distance covered, pace variability, and recent performance history, as well as contextual factors such as temperature. The segmented analysis revealed distinct patterns across groups, indicating that different runner profiles present specific sensitivities to aspects such as intensity, regularity, and training consistency, further supporting the importance of personalized recommendations. In the third phase, an optimization algorithm was applied to identify combinations of controllable variables capable of maximizing the predicted performance for each profile. The evaluation of results was based on comparisons with baseline scenarios, in which decision variables were not adjusted, allowing objective quantification of the gains achieved through optimization. The resulting recommendations showed internal coherence and alignment with cluster characteristics: greater emphasis on intensity and training variety for young experienced runners; focus on regularity, consistency, and strength training for less experienced runners; and strategies centered on maintenance, balance, and load control for veteran runners. These findings demonstrate that the integration of clustering, predictive modeling, and optimization provides a consistent and promising approach for developing data-driven intelligent sports recommendation systems. Despite limitations related to sample size and the absence of more granular physiological indicators, the study provides initial evidence that computational models can effectively support personalized training in an efficient, accessible, and scalable manner. Future research may expand the dataset, incorporate additional informational dimensions, validate the model in larger populations, and explore its implementation in digital platforms. In conclusion, the combination of data science techniques and optimization methods significantly contributes to understanding running performance and to developing individualized recommendations that promote improvement, adherence, and safety in sports practice.”