Problem
As scientists, we want our data to be built upon. A major aim of the SPARC program is to optimize the reuse of peripheral nervous system (PNS)-related data, in order to maximize future discoveries. And to achieve this, specific data curation and sharing guidelines have been developed for all SPARC submissions [1]. These requirements align with the Findable, Accessible, Interoperable, Reusable (FAIR) data principles, a set of widely adopted high-level guidelines for optimizing the reuse of data by humans and Artificial Intelligence systems [2].
The necessity of data uniformity that these guidelines promote can’t be overstressed. However, they can be difficult to understand, and time-consuming to implement.These are challenges we experienced firsthand while preparing and sharing data from our SPARC research award for the Fecobionics device [3].
Solution
Our solution was to develop a computer tool that guides researchers step-by-step into making their data compliant with the SPARC guidelines. We envisioned this tool to be analogous to a tax filing software -but for “filing” SPARC data.
The first prototype of this tool was developed during the 2018 SPARC Codeathon, where it received the People’s Choice Award. We subsequently received support from SPARC for the continuous development of this tool - one we named Software for Organizing Data Automatically (SODA) [4,5,6].
SODA combines an intuitive user interface with automation to streamline the curation and sharing of SPARC datasets (Fig. 1). The software requests simple actions and inputs from the users, to then transform their experimental and computational data into a SPARC-compliant dataset. Automation is implemented whenever possible to automatically execute tasks that can be accomplished by a computer, such as creating metadata files into the SPARC-mandated format.
Figure 1. Screenshot of the starting user interface of SODA.
Impact
Easier and faster dissemination of FAIR SPARC data
SODA integrates with various tools that are integral to the SPARC data curation process. This enables researchers to meet all requirements of a SPARC dataset, simply by using SODA (Fig. 2). The software integrates with tools and resources developed by the SPARC Data and Resource Center - such as Pennsieve, SciCrunch, and the SDS validator - as well as external tools like Protocols.io and the NCBI Taxonomy.
Since January 2021, SODA has been used to “file” about 11TB of SPARC data, or nearly 140k individual data files. Evaluations of the performance of SODA compared to the manual equivalent have shown that our software reduces data curation and sharing time on average by 70%, eliminates most of the human error in the process, and reduces the complexity considerably. With the recent introduction of our Guided Mode, where all the steps are logically aligned and interconnected, we expect to further improve these performances.
We believe that the rapid dissemination of well-curated SPARC data through SODA will continue to enhance SPARC’s mission of accelerating the development of neuromodulation devices. In addition, as the SPARC Data and Resource guidelines are adopted outside of SPARC, we believe our software will similarly benefit new projects as well.
Figure 2. Illustration of some of the current and expected integration of SODA with the SPARC tools and resources.
Expanding SODA’s concept beyond SPARC
The positive impact SODA made on the SPARC program drove us to expand our idea of software-assisted curation and sharing of FAIR data to other research programs. As such, we are developing FAIRshare, an open source and cross-platform desktop software similar to SODA that supports the curation and sharing of COVID-19-related data and software. This comes through support from the National Institute of Allergy and Infectious Diseases (NIAID) [7] and recently, we received funding from the new NIH Common Fund’s Bridge to Artificial Intelligence (Bridge2AI) program. With their support, we will co-lead the development of fairhub.io, a cloud platform for managing, curating, and sharing FAIR diabetes-related research data [8].
We hope to continue developing many more tools that make sharing FAIR biomedical research data easier. Because like we learned in kindergarten - sharing is caring!