Dissertation (May 13, 2026): Ana Gabriela Viana de Araújo

Student: Ana Gabriela Viana de Araújo

Title: Application of concept drift-based methods for predicting goals in professional football

Advisor: Jorge de Abreu Soares

Committee: Jorge  de Abreu Soares (Cefet/RJ), Glauco Fiorott Amorim (Cefet/RJ), Pedro Henrique González Silva (UFRJ), Carlos Eduardo Ribeiro de Mello (Unirio)

Day/Hour: May 13, 2026 / 9 p.m.

Room: https://teams.microsoft.com/meet/235075217075845?p=ezgpJ8dNRKnMa8dTfM

Abstract: This work investigates the application of concept drift detection techniques for the early identification of goals in soccer matches, based on intra-match event data. The approach treats the problem as monitoring changes in the intra-match pass distribution, using operationally virtual drift, that is, detection based exclusively on P(X) without labels in real time, with the premise that these changes precede changes in the probability of a goal. The robustness of the results is verified by temporal division with 190 training matches and 190 test matches. Data from the 2015/2016 La Liga season were used: 380 matches, aggregated in one-minute intervals, with analysis of both offensive and defensive behavior. Three drift detectors were evaluated (Page-Hinkley, KSWIN, and ADWIN) in comparison with deterministic and stochastic baselines, using moving averages of pass frequency as input signal. The evaluation adopts an asymmetric variant of the SoftED evaluation, which penalizes late alarms through a decreasing linear scoring function in the [t-K,t] window, with K=10 minutes. The results indicate that Page-Hinkley obtained the highest MCC among the detectors evaluated, surpassing both baselines; Page-Hinkley and KSWIN presented equivalent F1 values, with a marginal advantage for KSWIN. Comparison with supervised approaches from the literature shows that the proposed method, although simpler and without the need for labeled data, achieves competitive performance from the first match. Limitations of the approach are discussed, including the use of passes as the only proxy signal and the restriction to a single season, as well as perspectives for future work with multivariate variables and longitudinal analysis.