Artigo publicado na Language Resources and Evaluation – Evaluation of the Brazilian Portuguese version of linguistic inquiry and word count 2015 (BP-LIWC2015)

Evaluation of the Brazilian Portuguese version of linguistic inquiry and word count 2015 (BP-LIWC2015)Link

Artigo Publicado na Language Resources and Evaluation – Autores: Flavio Carvalho, Fabio Paschoal Junior, Eduardo Ogasawara, Lilian Ferrari & Gustavo Guedes

Abstract: Text psycholinguistic features are a valuable source for various research topics since they are used to obtain psychological, social, and linguistic aspects from written texts using dictionary files. These files are structured in categories, which are defined as groups of dictionary words that tap a particular domain (e.g., negative emotion words). The Linguistic Inquiry Word Count (LIWC) is a vastly used and versatile computer-based language analysis tool designed for text psycholinguistic analysis. The most recent version of the default English dictionary is LIWC2015, as it was released with the 2015 version of the LIWC software. The literature has recently introduced the latest Brazilian Portuguese LIWC dictionary (BP-LIWC2015), developed with the same categories as the LIWC 2015 English dictionary. However, the literature has also reported the need to evaluate BP-LIWC2015. In this scenario, this work investigates three questions: (i) Since LIWC2015 shows consistent improvements over the English dictionary developed in 2007 (LIWC2007), does BP-LIWC2015 achieves better text classification results than the older Brazilian Portuguese dictionary (BP-LIWC2007)? (ii) What is the equivalence between BP-LIWC2015 and BP-LIWC2007 with LIWC2015? (iii) Are there significant differences between Brazilian Portuguese dictionaries? To answer these questions, we conducted text classification experiments with four datasets and seven classification algorithms to compare the two Brazilian Portuguese LIWC dictionaries reported in the literature (i.e., 2007 and 2015). Second, we used a bilingual Portuguese-English scientific news collection to analyze the correlation between LIWC2015 and Brazilian Portuguese LIWC dictionaries. The results indicate that BP-LIWC2015 outperforms the older version in Brazilian Portuguese text classification. Finally, we found a more significant correlation between BP-LIWC2015 and the original English dictionary than the older version.

Artigo publicado na iSys – Identificação de predadores sexuais brasileiros em conversas textuais na internet por meio de aprendizagem de máquina

Identificação de predadores sexuais brasileiros em conversas textuais na internet por meio de aprendizagem de máquinaLink

Artigo publicado na iSys – Revista Brasileira de Sistemas de Informação. Autores: L. Santos and G. Guedes.

Resumo—Nos dias de hoje um grande número de crianças e adolescentes tem usado aplicações sociais. De fácil acesso, essas aplicações promovem benefícios e oportunidades. No entanto, ao mesmo tempo, expõem os usuários à diferentes riscos, dentre os quais a atividade predatória sexual. A atividade predatória sexual possui diversas finalidades como a obtenção de pornografia infantil, a extorsão e o abuso sexual. O presente trabalho possui três objetivos principais: (i) criar um conjunto de dados de conversas textuais contendo atividade sexual predatória real para o português do Brasil; (ii) realizar uma análise estatística das conversas textuais presentes nesse conjunto de dados; (iii) realizar uma avaliação experimental considerando os algoritmos de aprendizado de máquina mais populares no domínio da pesquisa com o conjunto de dados construído. Essa avaliação considera a medida de F1 como base. Os resultados alcançados com as contribuições (i) e (ii) possibilitam que novos estudos possam se concentrar na problemática da identificação de predadores sexuais em conversas textuais para o português do Brasil. Os resultados obtidos com a contribuição (iii) evidenciam que as Máquinas de vetores de suporte obtiveram o melhor comportamento, apresentando um resultado de 89.87%.

Artigo publicado na IEEE Latin America Transactions – BRAPT: A New Metric for Translation Evaluation Based on Psycholinguistic Perspectives

BRAPT: A New Metric for Translation Evaluation Based on Psycholinguistic PerspectivesLink

Artigo publicado na IEEE Latin America Transactions. Autores: R. Guimarães, K. Tavares, R. Reis, L. Ferrari, E. Ogasawara, and G. Guedes

Abstract—There are some metrics to evaluate automatic text translations in the literature. However, the state-of-the-art of these metrics still has limitations. One of them is the dependence of an exact and ordered pairing of words for evaluating similarity among texts. Another, is the non-consideration of the semantics of the text in such comparison. Previous studies point out the need to analyze the semantics of words in the evaluation of translations. In this scenario, this paper presents a novel metric capable of evaluating the differences in automatic text translations that takes into account the semantics of the words presented in the texts. As a proof of concept, we selected ten journalistic texts written in English. These texts have been translated to Portuguese by a specialist and by three automatic text translation tools. Experimental results show the potential of the proposed metric in evaluating these translations, indicating it can perform better than the state-of-the-art metric.