Dissertation (May 12, 2026): Gustavo Melo
Student: Gustavo Melo
Title: Reconhecimento de Entidades Nomeadas em Relatos Criminais Informais com Apoio de Metadados Estruturados
Andvisors: Eduardo Bezerra da Silva and Karla Figueiredo
Committee: Eduardo Bezerra da Silva (Cefet/RJ), Karla Figueiredo (UERJ), Gustavo Paiva Guedes (Cefet/RJ), Kele Teixeira Belloze (Cefet/RJ) and Ronaldo Ribeiro Goldschmidt (IME/RJ)
Day/Hour: May 12, 2026 / 9 a.m
Abstract: This work investigates the problem of named entity recognition in informal crime reports recorded by the Disque Denúncia service. These reports, often marked by colloquial language, spelling mistakes, and free text structure, pose significant challenges to the use of traditional Natural Language Processing models. In addition to free text, the reports are accompanied by structured metadata, such as type of occurrence, location, and date which can provide additional relevant context for the task. In this study, we propose an approach based on the fine-tuning of large language models, using a manually annotated corpus with entities of the types Person, Location, and Organization. To overcome the scarcity of labeled data, the methodology includes the application of pseudo-labeling on a second, significantly larger corpus, thus expanding the training base in a semi-supervised manner. Additionally, the metadata from the reports are incorporated as a source of context both in preprocessing and in the evaluation and refinement processes of the models. The experiments were conducted with the GliNER model and evaluate that the use of metadata and pseudo-labeling can contribute to improving model performance on informal corpora, with positive impacts on automated information extraction in public security contexts. The results reinforce the potential of hybrid and domain-sensitive approaches for real-world Natural Language Processing applications in environments with scarce labeled data.