Dissertation defense (August 14, 2020): Thiago da Silva Pereira

Student:  Thiago da Silva Pereira

Title: Hot-Deck Data Imputation: a comparison among ensemble methods

Advisors:  Jorge de Abreu Soares (advisor), Eduardo Bezerra da Silva (CEFET/RJ) (co-advisor).

Committee: Jorge de Abreu Soares (president), Eduardo Bezerra da Silva (CEFET/RJ), Diego Nunes Brandão (CEFET/RJ), Ronaldo Ribeiro Goldschmidt (IME)

Day/Time: August 14, 2020 / 15h.

Room: meet.google.com/mtr-vmkq-wrw

Abstract:

Preprocessing data faces an important question related to deal with missing data. A possible solution to resolve this problem is hot-deck imputation. This technique has two steps: group similar records and performs imputation. Selecting the best algorithm for imputation is a challenge. Several machine learning algorithms are studied for data imputation, however few studies compare ensemble methods for the imputation stage.
This study proposes a solution based on hot-deck imputation comparing four ensemble regressors: Bagging, Adaboost, Gradientboost, and Stacked Generalization. To ascertain effectiveness, we have used three datasets, varying missing rates from 10% to 30%. Results measuring the precision of imputed data by both techniques indicate that the Gradientboost reveals better precision in reasonable processing time.

Dissertation