Home List of Titles Computation and storage trade-off for cost-effectively storing scientific datasets in the cloud
Please use this identifier to cite or link to this item: http://hdl.handle.net/1959.3/228687
- Computation and storage trade-off for cost-effectively storing scientific datasets in the cloud
- Yuan, Dong; Yang, Yun; Liu, Xiao; Chen, Jinjun
- Scientific applications are usually data intensive [1,~ 2], where the generated datasets are often terabytes or even petabytes in size. As reported by Szalay and Gray in , science is in an exponential world and the amount of scientific data will double every year over the next decade and future. Producing scientific datasets involves large number of computation intensive tasks, e.g., with scientific workflows , hence taking a long time for execution. These generated datasets contain important intermediate or final results of the computation, and need to be stored as valuable resources. This is because: (1) data can be reused - scientists may need to re-analyze the results or apply new analyses on the existing datasets ; (2) data can be shared - for collaboration, the computation results may be shared, hence the datasets are used by scientists from different institutions . Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage.
- Publication type
- Book chapter
- Research centre
- Swinburne University of Technology. Faculty of Information and Communication Technologies
- Handbook of data intensive computing / Borko Furht and Armando Escalante (eds.), Chapter 5, pp. 129-153
- Publication year
- Cloud computing; Computation; Data sharing; Data storage; Datasets; Scientific data
- 9781461414148, 1461414148
- Publisher URL
- Copyright © Springer Science+Business Media, LLC 2011.