Dissertation defense (December 19, 2024): Tarsila Gomes Bello Tavares

Student: Tarsila Gomes Bello Tavares

Title: Cascade Imputation in the Context of Data-Centric Artificial Intelligence

Advisor: Jorge de Abreu Soares

Committee:Jorge de Abreu Soares (Cefet/RJ), Diego Nunes Brandão (Cefet/RJ), Carlos Eduardo Ribeiro de Mello (Unirio)

Day/Time: December 19, 2024 / 14 p.m.

Room: Bloco E, 5º andar, sala E-518

Abstract: As the global volume of data increases, it is common to encounter datasets with missing values, demanding the application of imputation techniques. Traditionally, these methods address univariate scenarios, dealing with the absence of values in a single column. This study proposes a cascade imputation approach, capable of handling missing values across multiple columns, reintegrating imputed values into the database before imputing the subsequent attribute, allowing for their reuse. Additionally, the study investigated the potential improvement in imputation efficiency by binarizing data according to patterns of similarity in absence before imputation and identifying which clustering algorithms yield the most interesting results for different dataset characteristics. Therefore, the aim is to evaluate and compare the performance of multivariate imputation using the cascade approach with a pre-clustering phase, applying various classes of algorithms, such as K-modes, Agglomerative Clustering, DBSCAN, and the SOM neural network.