Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail for analysis and exploration of genomic variants dataset. Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on four distinct whole-genome sequencing datasets.
A presentation of the paper is available on Next Platform TV at https://www.nextplatform.com/2020/08/13/next-platform-tv-for-august-13-2020/ (33'33’').