Data-driven Parallel Processing

Regarding the data science process, there is an urgent need to use high-performance processing (HPC) to achieve large scale data analysis. There are important challenges in establishing these analyses, commonly modeled as workflows. In these workflows, activities, and data are directed to execution in some HPC environment (e.g., clusters, grids, clouds). Due to the diversity of existing platforms for HPC environments, one of the great challenges is to establish a representation of these workflows that is agnostic to the environment in which they will be executed and, at least, to optimize their execution in the target environment. There are numerous challenges in managing and analyzing large volumes of data. The different applications, while presenting themselves as applied research, often provide the opportunity to elaborate new theoretical frameworks in basic data-based parallelism research. In particular, space and time characteristics of the spatiotemporal series bring several important and, at the same time, specific aspects that require differentiated algorithms to manage this data. In the context of space-time series, explore the latest technological solutions to potentiate or enable both the different forms of organization and storage of data, including approaches based on distributed storage systems (HDFS), object-relational databases, NoSQL and NewSQL, and data-based parallelism using Map-Reduce, Spark, or Algebra-based approaches to workflows.