Student: Gustavo Alexandre Sousa Santos
Title: EvolveDTree: A system of the Educational Data Mining based on Decision Tree and Genetic Algorithm to classify Dropout in Higher Education
Advisors: Diego Nunes Brandão (advisor), Luis Domingues Tomé Jardim Tarrataca (CEFET/RJ) (co-advisor)
Committee: Diego Nunes Brandão (president), Luis Domingues Tomé Jardim Tarrataca (CEFET/RJ), Diego Barreto Haddad (CEFET/RJ), Eduardo Bezerra (CEFET/RJ), Alexandre Plastino de Carvalho (UFF)
Day/Time: May 5, 2020 / 14h
Education is one of the foundations for the economic and social development of a country. Ensuring that investments in education are made efficiently is a significant challenge for the whole of society. In this regard, one of the major problems of public higher education occurs when students disassociate themselves from the institution without completing the course in which they were enrolled, a phenomenon known as dropout.
As a result, the resources invested in the training of those students end up being lost, representing a significant financial waste. The development of tools that assist in the process of minimizing dropout cases is therefore essential. The present work proposes the development of a system that allows the evaluation of different data mining techniques to classify a student’s tendency to drop out or graduate from the course in which he
is enrolled. The system seeks to identify characteristics that indicate dropout before it occurs, allowing some action to be taken to minimize it. For this purpose, an Educational Data Warehouse (EDW) was developed that enables the integration of educational data from a higher education institution. The results obtained demonstrate that the developed EDW is robust enough to allow several analyzes to be carried out by academic
management. Different classification models were evaluated using different metrics. Of these, the strategy based on decision trees showed the most promise. A dimensionality reduction technique based on a genetic algorithm was also evaluated. This strategy allowed for a reduction in the processing time of the training phase in all the classification assessed models. However, an increase in the total time of the proposed approach
was identified when the preprocessing and training phases were measured simultaneously.