Sharing is Caring - Making SPARC data FAIR with SODA


Simplifying the curation and sharing of SPARC datasets through open source computer software that combines an intuitive user interface with automation

Problem

As scientists, we want our data to be built upon. A major aim of the SPARC program is to optimize the reuse of peripheral nervous system (PNS)-related data, in order to maximize future discoveries. And to achieve this, specific data curation and sharing guidelines have been developed for all SPARC submissions [1]. These requirements align with the Findable, Accessible, Interoperable, Reusable (FAIR) data principles, a set of widely adopted high-level guidelines for optimizing the reuse of data by humans and Artificial Intelligence systems [2].

The necessity of data uniformity that these guidelines promote can’t be overstressed. However, they can be difficult to understand, and time-consuming to implement.These are challenges we experienced firsthand while preparing and sharing data from our SPARC research award for the Fecobionics device [3].

Solution

Our solution was to develop a computer tool that guides researchers step-by-step into making their data compliant with the SPARC guidelines. We envisioned this tool to be analogous to a tax filing software -but for “filing” SPARC data.

The first prototype of this tool was developed during the 2018 SPARC Codeathon, where it received the People’s Choice Award. We subsequently received support from SPARC for the continuous development of this tool - one we named Software for Organizing Data Automatically (SODA) [4,5,6].

SODA combines an intuitive user interface with automation to streamline the curation and sharing of SPARC datasets (Fig. 1). The software requests simple actions and inputs from the users, to then transform their experimental and computational data into a SPARC-compliant dataset. Automation is implemented whenever possible to automatically execute tasks that can be accomplished by a computer, such as creating metadata files into the SPARC-mandated format.

Sharing is Caring media Figure 1. Screenshot of the starting user interface of SODA.

Impact

Easier and faster dissemination of FAIR SPARC data

SODA integrates with various tools that are integral to the SPARC data curation process. This enables researchers to meet all requirements of a SPARC dataset, simply by using SODA (Fig. 2). The software integrates with tools and resources developed by the SPARC Data and Resource Center - such as Pennsieve, SciCrunch, and the SDS validator - as well as external tools like Protocols.io and the NCBI Taxonomy.

Since January 2021, SODA has been used to “file” about 11TB of SPARC data, or nearly 140k individual data files. Evaluations of the performance of SODA compared to the manual equivalent have shown that our software reduces data curation and sharing time on average by 70%, eliminates most of the human error in the process, and reduces the complexity considerably. With the recent introduction of our Guided Mode, where all the steps are logically aligned and interconnected, we expect to further improve these performances.

We believe that the rapid dissemination of well-curated SPARC data through SODA will continue to enhance SPARC’s mission of accelerating the development of neuromodulation devices. In addition, as the SPARC Data and Resource guidelines are adopted outside of SPARC, we believe our software will similarly benefit new projects as well.

SODA integrations Figure 2. Illustration of some of the current and expected integration of SODA with the SPARC tools and resources.

Expanding SODA’s concept beyond SPARC

The positive impact SODA made on the SPARC program drove us to expand our idea of software-assisted curation and sharing of FAIR data to other research programs. As such, we are developing FAIRshare, an open source and cross-platform desktop software similar to SODA that supports the curation and sharing of COVID-19-related data and software. This comes through support from the National Institute of Allergy and Infectious Diseases (NIAID) [7] and recently, we received funding from the new NIH Common Fund’s Bridge to Artificial Intelligence (Bridge2AI) program. With their support, we will co-lead the development of fairhub.io, a cloud platform for managing, curating, and sharing FAIR diabetes-related research data [8].

We hope to continue developing many more tools that make sharing FAIR biomedical research data easier. Because like we learned in kindergarten - sharing is caring!

AUTHOR
Bhavesh Patel, Ph.D.

PUBLISHED DATE
January 26, 2023

TEAM MEMBERS

The FAIR Data Innovations Hub team at the California Medical Innovations Institute:

Christopher Marroquin ORCID iD: 0000-0002-3399-1731

Jacob Clark ORCID iD: 0000-0001-8134-6481

Dorian Portillo ORCID iD: 0000-0002-4306-4464

Sanjay Soundarajan ORCID iD: 0000-0003-2829-8032

Bhavesh Patel ORCID iD: 0000-0002-0307-262X


SUPPORTING INFORMATION

[1] Bandrowski, Anita, Jeffrey S. Grethe, Anna Pilko, Thomas H. Gillespie, Gabi Pine, Bhavesh Patel, Monique Surles-Zeigler, and Maryann E. Martone. "Sparc Data Structure: Rationale and design of a FAIR standard for biomedical research data."bioRxiv (2021). doi.org/10.1101/2021.02.10.430563

[2] Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al. "The FAIR Guiding Principles for scientific data management and stewardship." Scientific data 3, no. 1 (2016): 1-9. doi.org/10.1038/sdata.2016.18

[3] Fecobionics grant number: NIH SPARC 3OT2OD025308

[4] SODA grant number: NIH SPARC OT2OD030213

[5] About SODA: https://fairdataihub.org/sodaforsparc

[6] SODA source code: https://github.com/fairdataihub/SODA-for-SPARC

[7] About FAIRshare: https://fairdataihub.org/fairshare

[8] About fairhub.io: https://fairdataihub.org/blog/bridge2AI-fairdataihub


Share
View All Success Stories >
Have something to share with the community? We would love to hear from you! Submit your success story here