Search Swinburne Research Bank
Home
List of Titles
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
List of Titles
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
Please use this identifier to cite or link to this item: http://hdl.handle.net/1959.3/196653
- Title
- A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
- Author(s)
- Yuan, Dong; Yang, Yun; Liu, Xiao; Zhang, Gaofeng; Chen, Jinjun
- Abstract
- Many scientific workflows are data intensive where large volumes of intermediate data are generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science in the cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenance in scientific workflows. With the IDG, deleted intermediate data can be regenerated, and as such we develop a novel intermediate data storage strategy that can reduce the cost of scientific cloud workflow systems by automatically storing appropriate intermediate data sets with one cloud service provider. The strategy has significant research merits, i.e. it achieves a cost-effective trade-off of computation cost and storage cost and is not strongly impacted by the forecasting inaccuracy of data sets' usages. Meanwhile, the strategy also takes the users' tolerance of data accessing delay into consideration. We utilize Amazon's cost model and apply the strategy to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.
- Publication type
- Journal article
- Research centre
- Swinburne University of Technology. Faculty of Information and Communication Technologies
- Source
- Concurrency and Computation: Practice and Experience, Vol. 24, no. 9 (Jun 2012), pp. 956-976
- Publication year
- 2012
- FOR Code(s)
- 0805 Distributed Computing
- Keyword(s)
- Cloud computing; Data sets storage; Scientific workflow
- Publisher
- John Wiley & Sons
- ISSN
- 1532-0634
- Publisher URL
- http://dx.doi.org/10.1002/cpe.1636
- Copyright
- Copyright © 2010 John Wiley & Sons, Ltd.
- Peer reviewed


