The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing.
Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing.
In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intra-scheduler load balancing, in order to avoid the use of the large-scale communication network, and inter-scheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication.
We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.