SODA - Desktop Software to Enhance the SPARC Data Curation Workflow

FOCUS: Tools and Technologies
PRINCIPAL INVESTIGATOR(S): Bhavesh Patel
INSTITUTION(S): California Medical Innovations Institute
FUNDING PROGRAM(S): SPARC
NIH AWARD: OT2OD030213

As science is becoming data-intensive and collaborative, data sharing and reusing are becoming increasingly important for rapid discovery and innovation. While sharing is as easy as few clicks in this era of digital data, reusing is difficult unless some strict data curation standards are followed. To address this issue, curation guidelines are provided to all SPARC funded researchers for organizing and sharing their datasets such that researchers who were not directly involved in data collection and analysis can rapidly understand and work with the data, as promoted by the FAIR principles. Complying with these guidelines, however, requires additional time investment from the researchers and is subject to human error. The data curation process could thus become time-consuming and overwhelming as more and more data is generated and, eventually, steer the focus away from data collection and analysis. The objective of this project is to continue the development of SODA (Software for Organizing Data Automatically), the software being developed to assist SPARC funded researchers in curating their datasets. Distributed as an opensource and cross-platform desktop application, the goal of SODA is to bridge a long-standing, overlooked gap between comprehensive data standards and their convenient application by researchers. This is achieved by providing an interactive interface that, without requiring any coding knowledge, walks SPARC researchers step-by-step through the data curation process, all the while automating repetitive, complex, and time-consuming tasks. During this phase of development, six new features will be added to SODA: 1) Full file organization interface with lower-level folder support; 2) Collaborative data curation; 3) In-app updating; 4) Enhanced support for generating metadata files; 5) Enhanced dataset validator; 6) File-level curation support. The proposed project is very significant for its potential to simplify and enhance the SPARC data curation process. Computer-assisted data curation will not only reduce the time required by researchers to organize their data but also minimize – if not eliminate – human error. The resulting rapid dissemination of SPARC data will enhance SPARC’s mission of accelerating the development of therapeutic devices that modulate electrical activity in nerves to improve organ function.