Dissertation defense (January 17, 2020): Ribamar Santos Ferreira Matias

Student: Ribamar Santos Ferreira Matias

Title: Data Integration as Support for Whole Cell Modeling of Pseudomonas aeruginosa CCBH4851 Bacteria

Advisor: Kele Teixeira Belloze

Committee: Kele Teixeira Belloze (president), Eduardo Bezerra da Silva (CEFET/RJ), Fabrício Alves Barbosa da Silva (FIOCRUZ)

Day/Time: January 17, 2020/ 10:00h

Room: Auditorium 5

Abstract:

Comparative analysis of genomes through computational processes is a low cost approach with promising potential to support researchers. Such analysis is favored by considering the various data from studies on model organisms available in public databases. This approach was used in the present work to analyze the genome of the Pseudomonas aeruginosa strain CCBH4851. This strain, identified in Brazil in 2008, is being researched by FIOCRUZ and partners, due to its association with nosocomial infections, and its high degree of resistance, detected after testing with various antibiotics. In this sense, the lifting of essential proteins that may help in the development of new antibiotics in the fight against
bacteria becomes relevant. Thus, the objective of this work is to build a database to expand the available knowledge on the Pseudomonas aeruginosa CCBH4851, based on data from in-depth studies with other organisms. This database gathers information such as bacterial protein ontology annotations, homology and orthology data, and indicators of functional semantic similarity between their proteins and those of reference organisms in the study of the species P. aeruginosa. In addition, a machine learning process was designed to infer which bacteria proteins have essential characteristics, which are the preferred target for antibiotic action. To gather this set of information, strictly computational methods were employed, supported by tools for analysis of genomic sequences, such as Blast2GO, InterProScan, GOGO, Blastp and Orthofinder, referencing sets of proteins from public genomic databases, such as Uniprot, OGEE, Interpro and KEGG. The machine learning process consisted of the execution of an LSTM neural network. Although less accurate than manual curation analysis, computational methods are continually evolving,
and new technologies and tools for bioinformatics are often available. These resources have promising potential to assist researchers in genome knowledge and decision making tasks. The Gene Ontology ontology annotations of approximately 60% of the total proteins, indicators of semantic similarity, as well as the set of orthologous proteins of the Pseudomonas aeruginosa CCBH4851 strain, are available in the database created.
comparative processes with reference proteomes. Finally, the project suggests a flow of activities that can be applied as a generic initial approach to new genome studies, which can be enhanced and extended by future works.

Dissertation