Rumo à Otimização de Operadores sobre UDF no Spark

Venue: CSBC 2018 – BreSci 2018

Date: July / 2018

Location: Natal, RN – Brasil


Large-scale data analysis has gained much importance in the scientific community due to the Big Data phenomenon. In this context, user-defined functions (UDFs) are commonly implemented in frameworks such as Apache Spark to enable large-scale data analysis. However, the use of UDF brings challenges in the optimization of execution as they are opaque. This work proposes a method of optimizing data analysis workflows supported by UDF on Apache Spark. This method is based on SparkSQL’s Catalyst API and Scala language macros.


