Archive | Defesas de orientandos RSS for this section

Defesa de dissertação (26/08/2021): Lucas Giusti Tavares

Discente: Lucas Giusti Tavares

Título: Analyzing Flight Delay Prediction Under Concept Drift

Orientadores:  Jorge de Abreu Soares (orientador) e Eduardo Soares Ogasawara (CEFET/RJ) (coorientador).

Banca: Jorge de Abreu Soares (presidente), Eduardo Soares Ogasawara (CEFET/RJ), Rafaelli de Carvalho Coutinho (CEFET/RJ) e Antônio Tadeu Azevedo Gomes (LNCC)

Dia/hora: 26 de agosto de 2021, às 13h30.

Sala remota: https://meet.google.com/zow-fgxq-fte

Resumo (Abstract):

Delay is one of the most critical indicators for flight transportation systems. Flight delays impose a challenge that impacts any flight transportation system. In this context, the prediction of delayed flights may be an essential tool for effectively addressing this problem. This dissertation investigates the prediction performance of different drift handling strategies in aviation under different scales. It considers two different scales: \textit{system-based} (SB) and \textit{airport-based} (AB). In (SB), all airports in the flight system are considered together.
Conversely, in AB, each airport is studied separately. Specifically, this work proposed and answered two research questions: (i) How do drift handling strategies influence the prediction performance of delays?; and (ii) Do different scales change the results of drift handling strategies? It was observed that drift handling strategies are relevant. Their impact varies according to the scales used. The experimental evaluation was done using a dataset that integrates weather and flight data from the Brazilian system.
Moreover, the passive and active strategies revealed better recall scores. For f1 scores, the strategies had similar results, with the passive strategy showing slightly better results. It may be related to the high prevalence of drifts. In this case, strategies that always retrain machine learning models offer better results than those that train only once. However, extensive testing is recommended. Nonetheless, choosing machine learning models may have a higher impact on f1 than drift handling strategies.

Defesa de dissertação (29/01/2021): Leandro Maia Gonçalves

Defesa de dissertação (29/01/2021): Leandro Maia Gonçalves

Discente: Leandro Maia Gonçalves

Título: Imputação Hot-Deck: uma revisão sistemática da literatura

Orientador:  Jorge de Abreu Soares

Banca: Jorge de Abreu Soares (presidente), Eduardo Soares Ogasawara (CEFET/RJ) e José Maria da Silva Monteiro Filho (UFC)

Dia/hora: 29 de janeiro de 2021, às 10h.

Sala remota: https://meet.google.com/mkz-opya-skv

Defesa de exame de qualificação (20/08/2020): Lucas Giusti Tavares

Discente: Lucas Giusti Tavares

Título: Flight Delay Prediction with Concept Drift: A Study of the Brazilian Flight Systems

Orientadores:  Jorge de Abreu Soares (orientador) e Eduardo Soares Ogasawara (CEFET/RJ) (coorientador).

Banca: Jorge de Abreu Soares (presidente), Eduardo Soares Ogasawara (CEFET/RJ), Rafaelli de Carvalho Coutinho (CEFET/RJ) e Antônio Tadeu Azevedo Gomes (LNCC)

Dia/hora: 20 de agosto de 2020, às 10h.

Sala remota: meet.google.com/jbq-cdip-syi

Resumo:

Flight delays impose challenges that impacts any flight transportation system. The prediction of flight delays may be an important tool for handling effectively with this problem. However, the behavior of flight delay system varies through time. This phenomenon is known as concept drift. The objective of this paper is to perform an analysis of concept drift in flight delay prediction of the Brazilian flight system. We evaluated it in the Brazilian flight system under different scales and time intervals. Many different drift handling techniques and classifiers models were studied. It was possible to observe that variance method may show less sensitivity to drifts. Moreover, the passive method showed slightly better results then active drift-dealing methods.

 

Defesa de dissertação (14/08/2020): Thiago da Silva Pereira

Defesa de dissertação (14/08/2020): Thiago da Silva Pereira

Discente: Thiago da Silva Pereira

Título: Imputação de dados hot-deck: uma comparação entre comitês de regressão (Hot-Deck Data Imputation: a comparison among ensemble methods)

Orientadores:  Jorge de Abreu Soares (orientador) e Eduardo Bezerra da Silva (CEFET/RJ) (coorientador).

Banca: Jorge de Abreu Soares (presidente), Eduardo Bezerra da Silva (CEFET/RJ), Diego Nunes Brandão (CEFET/RJ) e Ronaldo Ribeiro Goldschmidt (IME)

Dia/hora: 14 de agosto de 2020, às 15h.

Sala remota: meet.google.com/mtr-vmkq-wrw

Resumo:

O problema da ausência de dados em conjuntos de dados é relevante e dentre as maneiras de se lidar com este problema, a substituição do valor ausente por outro (também chamada de imputação de dados) produz um ganho substancial no aprendizado de máquina subsequente. Diversos algoritmos de aprendizado de máquina são estudados para a imputação de dados, porém poucos estudos utilizam métodos ensemble para a geração do dado a ser imputado. Este trabalho pretende realizar uma comparação entre diversos métodos ensemble (bagging, adaboost, gradientboost e stacked generalization) para imputação de dados, executando as simulações em três conjuntos de dados diferentes (AIDS Deaths – National Health and Family Planning Commission of China, Breast Cancer e Photometric redshift estimation) com 10%, 20% e 30% de dados ausentes, combinando a execução das tarefas de agrupamento e redução de dimensionalidade com percentuais de redução de 10%, 20% e 30% antes da imputação.

Abstract:

Preprocessing data faces an important question related to deal with missing data. A possible solution to resolve this challenge is hot-deck imputation. This technique has two steps: group similar records and performs imputation. Selecting the best algorithm for imputation is a challenge. Several machine learning algorithms are studied for data imputation, however few studies compare ensemble methods for the imputation stage. This study proposes a solution based on hot-deck imputation comparing four ensemble regressors: Bagging, Adaboost, Gradientboost, and Stacked Generalization. To ascertain effectiveness, we have used three datasets, varying missing rates from 10% to 30%. Results measuring the precision of imputed data by both techniques indicate that the Gradientboost reveals better precision in reasonable processing time.