Data Mining and Data Pre-Processing

The extraction of knowledge from data is an extremely important activity and demanded by several organizations existing in the business, government and scientific areas. The tasks of prediction, grouping, identification of patterns, anomalies and imputation are part of the set of research activities in this project.

Much of the progress in data mining tasks is achieved through research in data pre-processing techniques. Data pre-processing is an important step in the process of extracting knowledge from data. It includes cleaning, removing outliers, selecting attributes, defining samples and transforming data. Data preparation can take a considerable amount of computational processing time and the experiment as a whole. This step can mean the difference between obtaining knowledge or not and producing added value.

In data mining, several methods are designed assuming stationarity, i.e., the mean and variance remain constant, regardless of the selected sample. However, a considerable portion of the data collections has non-stationary properties in practice. Because of this, data mining methods are often compromised in this context. It is noted that despite the importance, there is a lot of space for the study of data mining and pre-processing techniques resilient to non-stationarity.

Researchers:

  • Eduardo Bezerra
  • Eduardo Ogasawara (Coordinator)
  • Jorge Soares

International partnerships:

  • Esther Pacitti (INRIA)
  • Florent Masseglia (INRIA)
  • Reza Akbarinia (INRIA)
  • Financial Information:

Financial Information:

  1. FAPERJ APQ1 public notice, project “Data Transformation Techniques for Time Series Forecasts through Neural Networks”, in the 2013-2014 period, with the coordination of professor Eduardo Ogasawara Funded amount: R$ 13,000.00;
  2. Notice of Research Group of CEFET / RJ, project “Research Group in Data Mining”, in the 2016-current period, with the coordination of professor Eduardo Ogasawara and Joel Santos. Financed amounts: R $ 6,992.00 and R$ 142,337.40;
  3. Edital FAPERJ APQ1, project “Identification of Motifs in Space-Time Series: Applications & Methods”, in the 2016-current period, with the coordination of professor Eduardo Ogasawara. Financed amount: R$ 10,000.00;
  4. CNPq Research Productivity Scholarship Notice, project “Data Science in Space-Time Series: Applications Data Management”, in the 2016-2018 period, by professor Eduardo Ogasawara. Financed amount: R$ 39,600.00
  5. Young Scientist of Our State (JNCE) Edital FAPERJ, project “Management and Analysis of Space-Time Series: Methods and Applications”, in the 2017-current period, by professor Eduardo Ogasawara. Financed amount: R$ 75,600.00.
  6. CNPq Research Productivity Scholarship Notice, project “Resilient methods to non-stationarity in the context of Data Science”, in the period 2019-2021, by professor Eduardo Ogasawara. Financed amount: R$ 39,600.00;
  7. Emergency Support for Graduate Programs in Rio de Janeiro, in the 2019-current period, with the coordination of professor Eduardo Ogasawara. Financed amount: R$ 39,879.98.
  8. PIBIC Scholarships.

These projects are under development by the group members and obtained a total financing amount of approximately R$ 367,009.38.