Data Preprocessing

Data preprocessing is an important step in the process of extracting knowledge from data. It includes cleaning, removal of outliers, selection of attributes, the definition of samples, normalization, and transformation. The preparation of data can take a considerable amount of computational processing time and the experiment as a whole. This step can mean the difference between gaining knowledge and producing value added. In the specific scenario of time and space-time series, the main techniques of statistical analysis, linear prediction models (ARIMA, regressions), transformations (moving average, differentiation, seasonality), transformation for frequency domain and indexing are important methods in Data preprocessing step. In fact, several algorithms used in the extraction of knowledge were designed on the assumption that the data to be analyzed are stationary, i.e. those for which the mean and variance remain constant over time, regardless of the sample selected. However, it is not uncommon to find collections of data that present non-stationary properties, a situation in which these techniques can not be directly applied. Note that despite the importance, there is little activity in the scientific community regarding the study of preprocessing techniques applicable to data that have an associated temporal and/or spatial dimension. The forefront of research in preprocessing methods addresses the issue that knowledge extraction tasks can be improved through more appropriate approaches, especially considering that these are heavily dependent on the adopted data model. Spatial-temporal data management brings alternatives that impact the design of algorithms and methods to preprocess them in order to achieve both a good accuracy in knowledge extraction models and good scalability in the processing of large volumes of data.