Student: Thiago da Silva Pereira
Title: Hot-Deck Data Imputation: a comparison among ensemble methods
Advisors: Jorge de Abreu Soares (advisor), Eduardo Bezerra da Silva (CEFET/RJ) (co-advisor).
Committee: Jorge de Abreu Soares (president), Eduardo Bezerra da Silva (CEFET/RJ), Diego Nunes Brandão (CEFET/RJ), Ronaldo Ribeiro Goldschmidt (IME)
Day/Time: August 14, 2020 / 15h.
Preprocessing data faces an important question related to deal with missing data. A possible solution to resolve this problem is hot-deck imputation. This technique has two steps: group similar records and performs imputation. Selecting the best algorithm for imputation is a challenge. Several machine learning algorithms are studied for data imputation, however few studies compare ensemble methods for the imputation stage.
This study proposes a solution based on hot-deck imputation comparing four ensemble regressors: Bagging, Adaboost, Gradientboost, and Stacked Generalization. To ascertain effectiveness, we have used three datasets, varying missing rates from 10% to 30%. Results measuring the precision of imputed data by both techniques indicate that the Gradientboost reveals better precision in reasonable processing time.