Search Swinburne Research Bank
Home
List of Titles
On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems
List of Titles
On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems
Please use this identifier to cite or link to this item: http://hdl.handle.net/1959.3/94879
- Title
- On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems
- Author(s)
- Yuan, Dong; Yang, Yun; Liu, Xiao; Chen, Jinjun
- Abstract
- Many scientific workflows are data intensive: large volumes of intermediate datasets are generated during their execution. Some valuable intermediate datasets need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on clouds has become popular nowadays, more intermediate datasets in scientific cloud workflows can be stored by different storage strategies based on a pay-as-you-go model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenances in scientific workflows. With the IDG, deleted intermediate datasets can be regenerated, and as such we develop a novel algorithm that can find a minimum cost storage strategy for the intermediate datasets in scientific cloud workflow systems. The strategy achieves the best trade-off of computation cost and storage cost by automatically storing the most appropriate intermediate datasets in the cloud storage. This strategy can be utilised on demand as a minimum cost benchmark for all other intermediate dataset storage strategies in the cloud. We utilise Amazon clouds' cost model and apply the algorithm to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that benchmarking effectively demonstrates the cost effectiveness over other representative storage strategies.
- Publication type
- Journal article
- Research centre
- Swinburne University of Technology. Faculty of Information and Communication Technologies
- Source
- Journal of Parallel and Distributed Computing, Vol. 71, no. 2, (Feb 2011), pp. 316-332
- Publication year
- 2011
- FOR Code(s)
- 0805 Distributed Computing
- Keyword(s)
- Cloud computing; Cost benchmarking; Dataset storage; Scientific workflow
- Publisher
- Academic Press
- ISSN
- 0743-7315
- Publisher URL
- http://dx.doi.org/10.1016/j.jpdc.2010.09.003
- Copyright
- Copyright © 2011 Elsevier. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher.
- Research Projects
-
Novel cloud computing based workflow technology for managing large numbers of process instances, Australian Research Council grant number LP0990393
- Full text

- Peer reviewed


