Skip to Main content Skip to Navigation

Resource management in computer clusters : algorithm design and performance analysis

Abstract : The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.
Complete list of metadata

Cited literature [125 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Monday, December 16, 2019 - 11:22:07 AM
Last modification on : Wednesday, June 15, 2022 - 9:03:55 PM
Long-term archiving on: : Tuesday, March 17, 2020 - 8:39:09 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02413496, version 1


Céline Comte. Resource management in computer clusters : algorithm design and performance analysis. Networking and Internet Architecture [cs.NI]. Institut Polytechnique de Paris, 2019. English. ⟨NNT : 2019IPPAT001⟩. ⟨tel-02413496⟩



Record views


Files downloads