Big Data Management, Integration and Workflows

Increasingly, organizations seek to analyze the growing number of data available to develop actions that bring competitive advantage and prominence in their field of activity. This process ranges from the correct collection and storage of data to their integration with information obtained from the Web. The data are associated with the organization’s planning, management and performance, and can be structured, semi-structured and unstructured. Thus, there is a need for treatment, transforming them into information and knowledge.

In order to significantly assist this process, this project analyzes different research opportunities. First, the need to process large volumes of heterogeneous data in a parallel and distributed manner is highlighted. This is a typical scenario in large projects in different areas of knowledge, such as bioinformatics, astronomy, engineering and medicine, where workflows have been widely adopted. Many of these workflows are large-scale and require high-performance computing environments (such as clusters, supercomputers and computer clouds) and parallelism techniques to run it in a viable time. In addition to these environments, in recent years, there has been frequent use of large-scale data-centric computing frameworks (Data Intensive Scalable Computing), such as Apache Spark, which provides efficient memory processing. One of the objectives of this project is to develop workflows for large-scale data management and analysis using these frameworks and to optimize their execution in parallel and distributed environments. Finally, the objective is also to study conceptual modeling techniques with workflows and ontologies applied to Big Data, and pre-processing, indexing and querying in Big Data, including approaches based on distributed storage systems (HDFS), management systems of object-relational databases, NoSQL and newSQL.

Researchers:

  • Eduardo Ogasawara
  • Jorge Soares
  • Kele Belloze
  • Rafaelli Coutinho (Coordinator)

International partnerships:

  • Esther Pacitti (INRIA)
  • Patrick Valduriez (INRIA)

Financial Information:

  1. FAPERJ Notice Installation aid, project “Parallelization of Scientific Workflows to Support e-Science Applications”, in the period 2012-2013, with the coordination of Professor Eduardo Ogasawara. Financed amount: R$ 4,650.00;
  2. FAPERJ ARC public notice, project “Virtual Machine Dimensioning Infrastructure in Computational Clouds” in the 2016-Current period, with the coordination of professor Rafaelli Coutinho. Financed amount: R$ 9,000.00.
  3. Emergency Support for Graduate Programs in Rio de Janeiro, in the period 2017-2019, with the coordination of Professor Eduardo Ogasawara. Financed amount: R$ 45,000.00.
  4. Notice APP-CAMPI Support from CEFET/RJ, projects, “Provision of an Internal Computer Cloud Infrastructure applied to Engineering Projects”, “Towards a Cloud-based Computational Framework for Data Science in Energy Production”, “Offloading in Fog Computing for Intelligent, Autonomous and IoT Applications”, in the 2017-Current period, coordinated by professor Rafaelli Coutinho. Financed amounts: R$ 31,000.00 (2017), R$ 42,750.00 (2018) and R $ 49,000.00 (2019).
  5. Edital Girls in Exact Sciences, Engineering and Computing CNPq / MCTIC nº 31/2018, project “Girls in Robotics” in the 2019-Current period, coordinated by professor Rafaelli Coutinho. Financed amount R$ 90,277.90;
  6. PIBIC Scholarships.

These projects have been under development by the group’s members since 2012 and obtained a total financing amount of approximately R$ 271,677.90.