Student: Flavio Matias Damasceno de Carvalho
Title: Development of the Portuguese LIWC 2015 Dictionary
Advisors: Gustavo Paiva Guedes e Silva (CEFET/RJ)
Committee: Gustavo Paiva Guedes e Silva (CEFET/RJ) (presidente), Eduardo Soares Ogasawara (CEFET/RJ), Joel André Ferreira dos Santos (CEFET/RJ), Lilian Vieira Ferrari (UFRJ)
Day/Time: March 28 / 10h
Room: Auditorium V
A great variety and quantity of texts are written and stored in digital format due to the development and dissemination of computational devices. From this large amount of textual data, useful information can be obtained with techniques and methodologies from the Text Mining field. One of these methodologies is to analyze texts with the Linguistic Inquiry and Word Count, a program that has several versions that have been improved over the years. In addition to using the standard dictionary file, the program can use custom dictionaries or dictionaries translated to other languages. In the use with the Portuguese translated dictionary, based on the English dictionary of the 2007 version of the LIWC, evaluations show issues related to the performance of negative valence detection, along with spelling mistakes and words with problems related to categorization, which negatively impacts obtained results. We have developed this work by observing an increase in the use of this resource in academic studies for the last years, evidenced by the growing number of citations to the article of publication of the dictionary with the translation into Portuguese. As we are not aware of the development of a more recent version in Portuguese and acknowledging the need for methods to analyze text in the Portuguese language, we started the development of a new Portuguese version of the LIWC dictionary. We work with the set of words available in the English version from 2015 and produce a new dictionary compatible with the latest available version of the program. Among the experiments to verify the performance in classification tasks, we carry out experiments to classify: (i) text authors and (ii) the content of publications in social networks according to sentiment polarity. The measures used to evaluate the results obtained by the classification algorithms present higher values in the new Portuguese version of the dictionary, comparing with the current dictionary. These experiments suggest that adjusting words to categories that appropriately match the linguistic and psychological characteristics allow better results in the tasks associated with the areas of Affective Computing and Sentiment Analysis.