INDIANAPOLIS – A bioinformatics research team from the School of Informatics and Computing at IUPUI, led by Executive Associate Dean Mathew Palakal, is one of the winning teams in an international challenge by the organizers of the 12th Annual International Conference on Critical Assessment of Massive Data Analysis. The challenge was to find discernible patterns of meaningful information in the genomic data of 38 human subjects (7 terabytes of genome data) from the Korean Personal Genome Project.
Other members of the team are students Deepali Jhamb, Akshay Desai and Premkumar Duraiswamy, and research scientist Meeta Pradhan.
The team developed an innovative systems biology pipeline for the analysis of next generation sequencing data, focusing on the interrelationships between genes with rare variants. Through their innovative big data analytics methods, they identified the prevalence of genes in two major domains: neurodegenerative diseases and tumor-related genes. This observation suggests that the Korean population might be more susceptible to these two major classes of complex diseases, when compared with other populations.
The computing power necessary for the project involved some 100 terabytes of storage and 100 gigabytes of RAM, provided by IU’s National Center for Genome Analysis Support under Director William Barnett.
As one of the winning teams, the group has been invited to present its findings at the CAMDA conference in Berlin, July 19 and 20.
The Big Data explosion forms one of the grand challenges in the modern life sciences. Analyzing large data sets is emerging as one of the scientific key techniques in the post-genomic era. Still the data analysis bottleneck prevents new biotechnologies from providing new medical and biological insights in a larger scale. The growing need for the analysis of massive data is mainly driven by high-throughput-sequencing technologies and the increasing size of biomedical studies.
Critical Assessment of Massive Data Analysis focuses on the analysis of massive data in life sciences. It introduces and evaluates new approaches and solutions to the Big Data problem. The conference presents new techniques in the field of bioinformatics, data analysis and statistics for the handling and processing of large data sets, the combination of multiple data sources and computational inference.