Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.

Image credit: Unsplash

Abstract

Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail for analysis and exploration of genomic variants dataset. Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on four distinct whole-genome sequencing datasets.

Publication
In Journal of the American Medical Informatics Association

A presentation of the paper is available on Next Platform TV at https://www.nextplatform.com/2020/08/13/next-platform-tv-for-august-13-2020/ (33'33’').

Related