: Publication 12480

Publication 12480

Title:	FAIRly big: A framework for computationally reproducible processing of large-scale data
Journal:	Scientific Data
Published:	11 Mar 2022
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/35277501/
DOI:	https://doi.org/10.1038/s41597-022-01163-2
URL:	https://www.nature.com/articles/s41597-022-01163-2.pdf
Citations:	17 (13 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework's performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).</p>

8 Authors

Adina S. Wagner
Laura K. Waite
Małgorzata Wierzba
Felix Hoffstaedter
Alexander Q. Waite
Benjamin Poldrack
Simon B. Eickhoff
Michael Hanke

1 Application

Application ID	Title
41655	Characterizing brain networks and their inter-individual variability by high-throughput imaging and computational modelling

Enabling scientific discoveries that improve human health