Evaluating Data Preprocessing Methods for Machine Learning Models for Flight Delays

Venue: IJCNN 2018

Date: July / 2018

Location: Rio de Janeiro, RJ – Brasil


Flight delays cause various inconveniences for airlines, airports, and passengers. According to data provided by the Brazilian National Civil Aviation Agency (ANAC), between 2009 and 2015, about 22% of domestic flights made in Brazil were delayed by more than 15 minutes. The prediction of these delays is fundamental to mitigate their occurrence and optimize the decision-making process of an air transport system. Particularly, airlines, airports, and users may be more interested in when delays are likely to occur than the accurate prediction of the absence of delays. This paper focuses on the unbalanced distribution of the classes of delay (presence and absence) by performing an experimental evaluation of several preprocessing methods for the development of machine-learning flight delay classification models. Those models were built from a dataset that integrates national flight operations with meteorological conditions of airports. Our results indicate the models that applied the balancing techniques performed much better in predicting the occurrence of delays, getting about 60% of hits.


I am a Professor of the Computer Science Department of the Federal Center for Technological Education of Rio de Janeiro (CEFET / RJ) since 2010. I hold a PhD in Systems Engineering and Computer Science at COPPE / UFRJ. Between 2000 and 2007 I worked in the Information Technology (IT) field where I acquired extensive experience in workflows and project management. I have solid background in the Databases and my primary interest is Data Science. He currently studies space-time series, parallel and distributed processing, and data preprocessing methods. I am a member of the IEEE, ACM, INNS, and SBC. Throughout my career I have been presenting consistent number of published articles and projects approved by the funding agencies, such as CNPq and FAPERJ. I am also reviewer of several international journals, such as VLDB Journal, IEEE Transactions on Service Computing and The Journal of Systems and Software. Currently, I am heading the Post-Graduate Program in Computer Science (PPCIC) of CEFET / RJ.

