Title: Rumo à Otimização de Operadores sobre UDF no Spark
Venue: CSBC 2018 – BreSci 2018
Date: July / 2018
Location: Natal, RN – Brasil
Large-scale data analysis has gained much importance in the scientific community due to the Big Data phenomenon. In this context, user-defined functions (UDFs) are commonly implemented in frameworks such as Apache Spark to enable large-scale data analysis. However, the use of UDF brings challenges in the optimization of execution as they are opaque. This work proposes a method of optimizing data analysis workflows supported by UDF on Apache Spark. This method is based on SparkSQL’s Catalyst API and Scala language macros.