Problem
The success of the SPARC program relies on the quality and availability of the datasets produced by SPARC investigators. To ensure SPARC datasets adhere to the FAIR principles (they must be Findable, Accessible, Interoperable and Reusable), they are all submitted to the open-access SPARC portal in the standard organizational scheme called the SPARC Data Structure (SDS). The SDS provides standard file and metadata structures and naming conventions along with data-type specific minimal requirements and ensures that diverse types of data are organized and described in a consistent manner. This is essential to the mission of SPARC as our research encompasses many diverse biomedical research fields, and the resulting datasets cover an array of topics including organ-specific circuitry, structural or functional connectivity, mapping of cell types, computational modeling, and molecular profiling. The SDS was recently endorsed by the International Neuroinformatics Coordinating Facility (INCF), establishing it as a community standard that will maximize re-use and impact of data. A widely used desktop application, SODA, has been developed within SPARC to guide users through the organization and submission of their data following the SDS. For those who look to embed support for the SDS into their existing tools or workflows, however, the SDS lacked an easy-to-use way to programmatically access metadata fields in SDS datasets or to create new datasets. This limited the ways in which researchers could interact with SPARC datasets and slowed the development of new tools. Under FAIR principles, a solution was needed to maximize the re-use and impact of SPARC datasets stored in SDS.
Solution
A solution to this problem was developed as the winning entry in the 2022 SPARC FAIR Codeathon. The team, led by Thiranja Prasad Babarenda Gamage, developed the SPARC Metadata editor (SPARC-me) in just over two-and-a-half days! SPARC-me is a Python tool designed to help investigators explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles. This tool, available on the SPARC portal, allows users to access curated data and metadata, create and validate new datasets, convert between different dataset structures, and enhance dataset descriptions through schema extensions.
Impact
Following their codeathon success, Babarenda Gamage has employed SPARC-me in his own work in the Clinical Translational Technologies Group at the Auckland Bioengineering Institute (ABI), where he is developing a Digital Twin Platform as part of the 12 LABOURS project. This platform aims to provide common infrastructure to support the generation of integrated digital twins (that is, virtual representations of individual patients) for clinical and home-based healthcare applications, and the demonstration of their efficacy in clinical trials. A key component of the Digital Twin Platform involves storing data using the SPARC Dataset Structure. Babarenda Gamage reports that SDS has provided a robust mechanism to standardize their data management practices and maximize the reuse and impact of data generated from their research. Over 30 researchers who are part of 12 LABOURS exemplar projects have started to store their data in SDS format in diverse computational physiology applications, including the development of novel biomarkers for pulmonary hypertension, rehabilitation of upper limb disorders, control of organ function by the autonomic nervous system in the uterus and stomach, and supporting breast cancer diagnosis and treatment. These efforts are aimed at creating an ecosystem to make data and research outcomes FAIR, enable reproducible science, meet Aotearoa New Zealand’s data sovereignty requirements, support clinical translation of computational physiology workflows and digital twins, and provide a foundation for integrating and supporting research developments across ABI and its institutional, national, and international collaborations.