Rumo à Otimização de Operadores sobre UDF no Spark

Title: Rumo à Otimização de Operadores sobre UDF no Spark

Venue: CSBC 2018 – BreSci 2018

Date: July / 2018

Location: Natal, RN – Brasil


Large-scale data analysis has gained much importance in the scientific community due to the Big Data phenomenon. In this context, user-defined functions (UDFs) are commonly implemented in frameworks such as Apache Spark to enable large-scale data analysis. However, the use of UDF brings challenges in the optimization of execution as they are opaque. This work proposes a method of optimizing data analysis workflows supported by UDF on Apache Spark. This method is based on SparkSQL’s Catalyst API and Scala language macros.


About Eduardo Ogasawara
I am a Professor of the Computer Science Department of the Federal Center for Technological Education of Rio de Janeiro (CEFET / RJ) since 2010. I hold a PhD in Systems Engineering and Computer Science at COPPE / UFRJ. Between 2000 and 2007 I worked in the Information Technology (IT) field where I acquired extensive experience in workflows and project management. I have solid background in the Databases and my primary interest is Data Science. He currently studies space-time series, parallel and distributed processing, and data preprocessing methods. I am a member of the IEEE, ACM, INNS, and SBC. Throughout my career I have been presenting consistent number of published articles and projects approved by the funding agencies, such as CNPq and FAPERJ. I am also reviewer of several international journals, such as VLDB Journal, IEEE Transactions on Service Computing and The Journal of Systems and Software. Currently, I am heading the Post-Graduate Program in Computer Science (PPCIC) of CEFET / RJ.

Comments are closed.