“Cloud computing approaches are likely to change the nature of our national research computing infrastructure in the coming years,” said Principal Investigator Geoffrey Fox, director of the Digital Science Center and associate dean of research and graduate studies in the IU School of Informatics and Computing. “These technologies hold significant promise in the life sciences and medical sciences as they offer the potential for greater computational power and faster speeds at a lower cost, and in a way that is easier for scientists to use than traditional grid computing approaches.”
Technological advances have made medical and biological research increasingly data-rich in recent years—a trend that scientists believe will continue to accelerate. Processing extremely large sets of digital data that result from gene sequencing and other medical research technologies is a significant challenge that generally cannot be met by a single facility or supercomputer.
The project team is developing a software infrastructure that makes use of the substantial hardware and networking investment made by Indiana University and the National Science Foundation in FutureGrid, a national experimental testbed, and TeraGrid, a national network of high performance computing resources. The project will also harness commercial cloud computing infrastructure such as Amazon Web Services, Microsoft Azure, and other open source software.
“This research is potentially path-breaking,” said Peter Cherbas, professor of biology and director of the IU Center for Genomics and Bioinformatics. Cherbas and other researchers from the Center will be significant contributing partners in the cloud computing research effort. “Contemporary DNA sequencing machines are churning out data at rates that would have been unimaginable to biologists just a few years ago. To use these data—to turn data into some kind of understanding—will demand good tools for using the Cloud and those tools will impact genomics projects worldwide. We’re very excited to be part of this effort.”
Cloud computing provides a way to outsource computing infrastructure in order to create virtual supercomputers with greater computational power than can be provided by any one facility. Clouds also support new data parallel technologies used to process massive data sets, such as Google’s MapReduce, a software framework to support distributed computing on clusters.
Users of clouds can access nearly unlimited computational power, created by pooling distributed computational resources, and using simple and straightforward web interfaces. This eliminates the need for users or their institutions to own and maintain large and expensive computational equipment, and also for users to have detailed technical understanding of the computational resources supporting their research. The research team will explore the use of cloud techniques to overcome current medical computing obstacles such as long computation time and large memory requirements.
In addition to developing new cloud computing approaches, the research team will partner with several IU life science research teams to apply and test these techniques in their specific areas of life science research. These include projects related to population genomics, an area of science that improves our understanding of evolution and genetic disorders, as well as projects involved in assembling and sequencing gene fragments.
Cloud technologies will also be applied to gene family clustering and the visualization of their structure in three dimensions. The overall goal is to provide a suite of services that will allow the simultaneous processing of many millions of gene samples in the cloud. Thanks to new sequencing technology, the size of the gene samples processed is expected to be one to two orders of magnitude larger than allowed by current computational capabilities.