Spark-based Cloud Data Analytics using Multi-Objective Optimization - Laboratoire d'informatique de l'X (LIX) Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Spark-based Cloud Data Analytics using Multi-Objective Optimization

Fei Song
  • Fonction : Auteur
  • PersonId : 1041225
Khaled Zaouk
  • Fonction : Auteur
  • PersonId : 1023606
Chenghao Lyu
  • Fonction : Auteur
  • PersonId : 1052429
Arnab Sinha
  • Fonction : Auteur
  • PersonId : 950504
Qi Fan
  • Fonction : Auteur
  • PersonId : 1090908
Yanlei Diao
  • Fonction : Auteur
  • PersonId : 1052430
Prashant Shenoy
  • Fonction : Auteur
  • PersonId : 1018176

Résumé

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.
Fichier principal
Vignette du fichier
ICDE21_Boosting_Analytics.pdf (3.85 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02549758 , version 1 (14-02-2021)

Identifiants

  • HAL Id : hal-02549758 , version 1

Citer

Fei Song, Khaled Zaouk, Chenghao Lyu, Arnab Sinha, Qi Fan, et al.. Spark-based Cloud Data Analytics using Multi-Objective Optimization. ICDE - 37th IEEE International Conference on Data Engineering, Apr 2021, Chania, Greece. ⟨hal-02549758⟩
302 Consultations
304 Téléchargements

Partager

Gmail Facebook X LinkedIn More