SLA-based Scheduling of Spark Jobs in Hybrid Cloud Computing Environments

      

ABSTARCT :

Big data frameworks such as Apache Spark are becoming prominent to perform large-scale data analytics jobs. However, local or on-premise computing resources are often not sufficient to run these jobs. Therefore, public cloud resources can be hired on a pay-per-use basis from the cloud service providers to deploy a Spark cluster entirely on the cloud. Nevertheless, using only cloud resources can be costly. Hence, now-a-days, both local and cloud resources are used together to deploy a hybrid cloud computing cluster. However, scheduling jobs in a cluster deployed on hybrid cloud is challenging in the presence of various Service-Level Agreement (SLA) demands such as cost minimization and job deadline guarantee. Most of the existing works either consider a public or a locally deployed cluster and mainly focus on improving job performance in the cluster. In this paper, we propose efficient scheduling algorithms that leverage different cost models in a hybrid cloud deployed cluster to optimize the Virtual Machine (VM) usage cost for both local and cloud resources and maximize the job deadline meet percentage. The results show that our proposed algorithms are highly scalable and reduce the cost of VM usage of a hybrid cluster for up to 20%.

EXISTING SYSTEM :

? When analyzing big data using Spark in existing environments, it is difficult to provision resources according to the system’s changing environment and the influence of other users’ executions. ? Using cloud technology however, it is possible to provision resources more effectively for the execution of jobs through dynamic resource provision methods. ? Spark framework have emerged following existing Hadoop framework, which was studied widely. ? Spark supports distributed processing on several nodes which is similar to Hadoop and in-memory computing as major feature. ? Unlike Hadoop executing disk I/O for data processing, Spark provides in-memory computing using a new concept of data structure RDD (Resilient Distributed Dataset).

DISADVANTAGE :

? We formulate an optimization problem for SLAbased scheduling of Spark jobs in a hybrid cloud. ? The first algorithm is a modified version of the First-Fit (FF) heuristic for solving bin packing problems. ? However, as our main target is costeffectiveness, this approach can not be applied to our problem. ? We describe the hybrid cloud model and formulate the problem of dynamic job scheduling between local VMs and cloud VMs. ? In the bin packing problem, items of different volumes must be packed into a finite number of bins or containers each of a fixed given volume in a way that minimizes the number of bins used.

PROPOSED SYSTEM :

• In this paper, we propose an auto-scaling method for utilizing resource of Spark clusters effectively in cloud computing environment. • The proposed auto-scaling method has a goal to meet user-specified deadline. • They consider inference aware scaling method in the proposed multi-layer node model to reduce performance damage owing to conflict from other services. • They propose a method of matching storage server and parallel data request, and a plan to improve the performance in interactive data access method. • They propose cost efficient scheduling method to execute scientific applications satisfying its deadline in private and public cloud environment.

ADVANTAGE :

? The executors from a job are distributed in different nodes in a roundrobin fashion for balancing the cluster load and improve performance. ? There are numerous works on inter-cluster scheduler, which focused to address these challenges from a performance standpoint. ? However, most of these works either consider a single cluster setup or tries to improve job performance. ? There are a few works that tried to improve different aspects of scheduling for Spark-based jobs. Sparrow tried to improve the performance of the default Spark scheduling by using a decentralized, randomized sampling based scheduler. ? We have developed a prototype system to evaluate the performance of the proposed job scheduling algorithms in a real hybrid cloud setup.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp