: Publication 9497

Publication 9497

Title:	Critical assessment of on-premise approaches to scalable genome analysis
Journal:	BMC Bioinformatics
Published:	21 Sep 2023
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/37735350/
DOI:	https://doi.org/10.1186/s12859-023-05470-2
URL:	https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-023-05470-2
Citations:	1 (1 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

BackgroundPlummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases.MethodsIn order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability.ResultsTools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database.ConclusionThe assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.</p>

5 Keywords

Chromosome Mapping
Data Science
Databases, Factual
Genomics
Sequence Analysis, DNA

5 Authors

Amira Al-Aamri
Syafiq Kamarul Azman
Gihan Daw Elbait
Habiba Alsafar
Andreas Henschel

1 Application

Application ID	Title
64823	A Genomic Data Science Framework for the 1000 Arab genome project

Enabling scientific discoveries that improve human health