Student: Leonardo da Silva Moreira
Title: Evaluation of Data Preprocessing Methods for Predicting Brazilian Flight Delays
Advisors: Jorge de Abreu Soares (advisor)
Committee: Jorge de Abreu Soares (president), Eduardo Soares Ogasawara (CEFET/RJ), Eduardo Bezerra da Silva (CEFET/RJ), Leonardo Gresta Paulino Murta (IC/UFF)
Day/Time: November 13, 2019 / 14h
Room: Auditorium 5
In 2016, revenues from Brazil’s air services sector reached record revenue of $ R$ 35.59$ billion, transporting 109.6 million passengers, according to a survey by the National Civil Aviation Agency (ANAC). Considering this scenario, delays in flights cause several inconveniences to airlines, airports, and passengers as they occurred between 2009 and 2015, where about 22% of domestic flights made in Brazil were delayed for more than 15 minutes. Predicting these delays is critical to mitigate their occurrence and optimize the decision-making process of an air transport system. In particular, airlines, airports, and users may be more interested in knowing when delays are likely to occur than the forecast needs to know when they will not occur. In this context, this research presents an experimental evaluation of data preprocessing methods for machine learning classification models for the prediction of flight delays, in order to identify which methods and combinations of these methods can help improve prediction and results of the classifier under an unbalanced distribution of delay classes. For this, the methodology used includes the integration of aerial and meteorological data, pre-processing steps [cleaning, transformation, reduction] and finally the comparison of data prediction from these different pre-processing methods. In particular, this research contributes to the analysis of a spectrum of data preprocessing methods when compared to the bibliographic review, especially focusing on the distribution of delay classes. Among the objectives of this work are the more detailed verification in relation to the attributes of the classifier, the normalization, and discretization, mainly with respect to the range of parameters of the filter.