Integrated data placement and task assignment for scientific workflows in clouds

Umit V. Çatalyürek, Kamer Kaya, and Bora Uçar

Abstract. We consider the problem of optimizing the execution of scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are assigned on different execution sites, a file transfer is necessary. The output files are to be stored at a site. Each execution site is to be assigned a certain percentage of the files and tasks. These percentages, called target weights, are pre-determined and reflect either user preferences or the storage capacity and computing power of the sites. The aim is to place the data files into and assigning the tasks to the execution sites so as to reduce the cost associated with the file transfers while complying with the target weights. To do this, we model the workflow as a hypergraph and with a hypergraph-partitioning-based formulation, we propose a heuristic which generates data placement and task assignment schemes simultaneously. We report simulation results on a number of real life and synthetically generated scientific workflows. Our results show that the proposed heuristic is fast, can find file transfer reducing mapping and assignments, and respects the target weights.

Key words. Scientific workflows; cloud computing; data placement; task assignment; hypergraph partitioning