This documentation only applies to investigators that are funded through the NIH SPARC effort. Each investigator has user-credentials to the SPARC Consortium group on the DAT-Core website. All datasets that are SPARC related should be submitted to this account even though the investigator might have a separate, unaffiliated, private Blackfynn account. Dataset submission is required within 30 days of completing a project milestone (according to the SPARC Material Sharing Policy).
Table of Contents
- Stages of Data Submission
- Using Dataset Status flags in Blackfynn
- Steps for Submitting a Dataset
Stages of Data Submission
- Create dataset in Blackfynn
- Set permissions
- Upload data according to SPARC Dataset Structure (Templates)
- Create protocol in protocols.io
- Submit for Curation
- Integrity and metadata checks
- Image segmentation and annotation
- Scaffold registration
- PI shares with SPARC Consortium by releasing to embargo
- PI publishes dataset to public portal
Using Dataset Status flags in Blackfynn
The progress of a dataset through this process is communicated in Blackfynn using status flags. When a step is completed, the investigator (blue) or curator (tan) changes the status flag, and the dataset is passed into the next step. Three teams perform curation: UCSD, MBF and ABI depending on the data type.
Steps for Submitting a Dataset
Detailed instructions for how to submit a dataset are provided in the following sections.
- Creating a dataset
- Work with Data Curation Team to map metadata to standards
- Sharing datasets with the SPARC Consortium as Embargoed dataset
- Publishing datasets
1. Creating a dataset
There are a couple of easy steps to submit a dataset:
- Create a dataset on the DAT-Core (SPARC Consortium account on Blackfynn) by clicking the
New Datasetbutton in the top-right corner of the web application and provide a Dataset Name and a short Dataset Subtitle describing the dataset. Then, click on
Naming guidelines: The “dataset name” is equivalent to the title of your dataset publication. This is the public field that promotes your team's dataset on the SPARC Portal to determine if other researchers in the SPARC community may want to learn more about your research. Please make sure that your dataset title is different than your other dataset titles and that it is informative. Please make sure to keep either the URL or the complete title in your records, the title is the only field that is searchable in Blackfynn.
The dataset sub-title will be visible, but not searchable, on the Blackfynn platform so using two or three sentences to further define your dataset, differentiating it from other datasets will be useful. This field will become the short description immediately under the title of your dataset once it is published.
You now created a private dataset that only you, as the dataset owner, can see.
In the top left corner of the dataset page there will be a status list with the 12 status options that each SPARC Dataset will go through during the submission and curation process. The status of a dataset can be changed by anyone with edit permissions, and will be used by both teams to communicate the dataset’s progress through the data submission and curation process. Each status indicates which team is responsible for the step, until the dataset is published at step 12.
Dataset status will automatically set to for each new dataset.
- Change the ownership of the dataset to the PI of the lab. This is a SPARC requirement and ensures that the PI of the lab is the only person who can publish the dataset. To do this, click on the
Permissionstab on the left side bar, and add the PI to your dataset as a manager. Then click on the
Managerlabel next to his/her name and select
Make Owner. You will no longer be the owner of the dataset, but still have Manager permissions.
- Add permissions to your Award team (contact the DAT-Core if you need help adding people to your award team) and the SPARC Data Curation Team. Select your award-team from the dropdown menu, add with the appropriate permissions, and add the SPARC Data Curation Team with Manager permissions.
You have now allowed your award-team and the curation team to see, and edit the dataset.
Upload files to the dataset according to the SPARC guidelines. The SPARC Dataset Structure (version 1.2.3) may be downloaded as a zip file or you may create it on your own. For help with working with the SPARC Dataset Structure, which is based on the BIDS specification, contact firstname.lastname@example.org. More information on how to upload files can be found here.
Set dataset status to
Complete the metadata templates that are included in the downloaded SPARC Dataset Structure zip file. Make sure you are always using the most recent template version. Experimental metadata is specified by the SPARC Data Standards Committee based on the Minimal Information for a Neuroscience Dataset (MINDS) specification and are captured in the following files: 1) submission.xlsx, 2) dataset_description.xlsx, 3) subjects.xlsx, and 4) samples.xlsx. An annotated list of these fields can be found here.
Upload the protocol(s) used to generate the SPARC dataset to protocols.io/groups/sparc. After making your protocols.io account, make sure to join the SPARC group. The group can be found through the search bar at the top of the protocols.io webpage (also here). Upload the protocol within the SPARC group (this option is free to investigators). More specific instructions can be found here. Make sure to include a link to your protocol(s) within the dataset_description file. In order for the curation team to access the protocol for annotation the submitter needs to ensure that: 1) the protocol is added to the SPARC group, 2) the URL to the protocol is included in the dataset_description.xlsx file.
Once you have completed your data uploads, please select step 3. Ready for Curation (Investigator) to have your dataset submitted to the curation queue. Please note that the curation team will not look into your dataset until you change the status to ready for curation.
Set Dataset status to
Wait for Curation Team to process your dataset monitoring different stages of the process on Blackfynn platform as seen in the box below.
2. Work with Data Curation Team to map metadata to standards
Below are the steps and statuses listed for the curation cycle:
Once you indicate the dataset ready for curation, our team will switch the status to curation in progress and start curating your dataset, checking the integrity of data, validating values, working with image segmentation and creating maps. During this phase, you can monitor where your dataset is in the curation queue by looking at the status bar. The Curation Team will create a tracked ticket and will be reaching out to SPARC investigators to provide curation review results and to help address any errors.
Datasets that include microscopy image data are encouraged to pass through the image segmentation portion of the protocol, where SPARC investigators use MBF Bioscience software (MBF, MAP-Core) to create FAIR segmentations that can be retrieved by ABI for organ scaffold representations. For a detailed look at the MAP-Core SPARC Image Segmentation Workflow please refer to the following Google document.
For datasets that include image segmentation, the MBF Curation Team will reach out to SPARC investigators to provide curation review results and to help address any errors in the segmentation. To initiate the image segmentation workflow, the MBF Curation Team will provide investigators with access to MBF Bioscience segmentation software for FAIR neural, vascular, and anatomical reconstruction. Investigators can request a license of MBF Bioscience software by emailing SPARC@mbfbioscience.com.
As SPARC investigators use MBF Bioscience software to segment images within their dataset(s), they will send completed segmentation files directly to an assigned MBF SPARC segmentation assistant for curation. Files can be shared with MBF via MBF Bioscience’s file sharing mechanism or Blackfynn. The MBF SPARC segmentation assistant will review each file and communicate with the investigator directly via email or #Slack if files needs revision (i.e., investigator needs to include subject metadata, annotate additional fiducials, and/or address inaccuracies or incompleteness).
In this step, the MBF Curation Team finalizes with the researcher all necessary edits to segmentation file(s) and images so the files can be uploaded by the researcher into the “derivative” folder of BIDS format for the respective dataset on the Blackfynn platform. Once all files within the dataset are curated, segmentation files and image files can be used as staging for scaffold building and other portal representations/simulations. Image files are converted to include minimum metadata and written in a standard JP2000 file format which permits efficient viewing on the Portal.
Auckland Bioengineering Institute (ABI) downloads the segmentation files (in MBF format) from Blackfynn and configures 3D organ scaffolds for species and organ Physiome Model Repository (PMR). ABI utilizes non-image data from Blackfynn with annotations and registers embedded data to geometric scaffolds and flatmaps.
ABI uploads the transformation matrix for each set of registered data to Blackfynn as annotation. ABI also uploads the Uniform Resource Identifier (URI) for average scaffold in PMR and specific scaffold with registered data to Blackfynn as derived data with one or more parent Blackfynn IDs.
Anytime your input is needed during the curation process, you will receive email communication from the respective curation group. Please respond to the Curation Team’s inquiries in a timely manner. Your dataset may go through multiple iterations before it is ready for publishing.
Review the curation feedback letter that you received from the Curation Team. Work with the Curation Team on implementing all necessary changes to the dataset that were listed in the feedback letter. Provide missing information and/or files. When you upload all changes to the Blackfynn platform, please change the dataset status back to so the Curation Team can pick up the dataset for curation again.
Note: Your dataset can iterate between the status “Ready for Curation” and “Needs Attention” multiple times, until all SPARC mandated requirements are met.
When the curation process is finished you will receive an email from the Curation Team with their final signoff. The dataset status will then be switched to
Work with the Curation Team on reviewing and approving all edits and changes that were implemented during data alignment, annotation, and visualization. At this time the SPARC Data Curation team will work with you to finalize the dataset within Blackfynn, adding the finalized description and authors, selecting the license and provisioning a DOI.
Please verify the detailed description of your dataset that the Curation Team entered on your behalf using the description editor. This description will be highly visible once your dataset is published.
Upload a banner image for your dataset on Blackfynn. This can be done by clicking
Upload Banner Image in the
Overview page. This image should have a minimum resolution of 512px and will be associated with the dataset and used as a thumbnail once the dataset is published.
The Curation Team will assign your dataset a license Creative Commons Attribution CC-BY. You may also select this option yourself using the dropdown menu in the
Dataset Settings page.
The Curation Team will add dataset contributors in the order as they appear in the data-description file you have uploaded. The order in which the contributors are added will be the same as the order in which contributors are listed on the public dataset landing page (SPARC Portal and Discover). If you need to make changes, you can easily add contributors by selecting names from a drop-down menu. More information on how to add contributors can be found here.
3. Sharing datasets with the SPARC Consortium as Embargoed dataset
CONGRATULATIONS! Now your dataset is ready to be shared with the SPARC Consortium. You can share your dataset with the SPARC Embargoed Data Sharing Group with
Viewerpermissions. This allows any SPARC investigator who has signed the SPARC non-disclosure form to see your data.
Change the dataset status to:
4. Publishing datasets
One year after the initial upload of your dataset, you must publish your dataset to Blackfynn Discover, which populates the SPARC Portal. To do this, 1) you can navigate to the
Publishingleft-hand menu in Blackfynn, 2) click
Submit Dataset for Review, 3) select the dataset to be submitted for review, 4) click the appropriate checkbox (the second checkbox is only available when releasing a revision or new version of a dataset), and 5) hit
Submit. Once submitted, your dataset will be locked, moved to the
Pending Reviewsection, and sent to the Publisher Team for review before it is ultimately accepted or rejected.
Change the dataset status to:
This document outlined the steps required to submit and publish a SPARC dataset. Please feel free to reach out to the DAT-Core or Curation Team with specific questions about the workflow.