Dissertation defense (August 23, 2022): Thiago Soares de Paula
Student: Thiago Soares de Paula
Title: Classificação de Notícias de Fraude e Corrupção em Português para Instauração de Processo Investigativo
Advisor: Gustavo Paiva Guedes e Silva
Day: August 23, 2022
Abstract: Fraud scandals are phenomena that can generate immeasurable impacts in the economic and reputational spheres. When a fraud is discovered, the facts are usually made public through the media, which generates a very large negative impact. Companies concerned about their images have invested more and more efforts to minimize or mitigate the effects of fraud. One of the tasks aimed at mitigating the effects of fraud is media monitoring of fraud and corruption. This task is fundamental for the assessment and monitoring of business risks in the corporate world, as facts arise that can cause harm to the company and its counterparties at all times. Once fraud scandals are reported on news sites, the impacts can have negative consequences for corporate images. Therefore, this information needs to be collected and analyzed and, if necessary, forwarded to the investigative process. However, the large volume of news published per day makes a daily manual assessment impossible. This work presents an approach that aims to automate this process, which includes collecting web news through web crawlers about the main media vehicles in Brazil, building an annotated corpus in Portuguese about fraud and corruption and creating a machine learning model whose function is to classify news as relevant or not for opening an investigation.