Opscientia is a Web 3.0 organization building out the decentralized science (DeSci) stack to provide tools for researchers all over the world to unleash data silos, manage their intellectual property, and receive funding for working in the open.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bbc0e8c8-4bed-4d0f-a2ad-95ab68287868/Slide11.png

The Open Science Data Wallet proof-of-concept is a first step for establishing the self-sovereign research data management block in the DeSci Stack. Researchers can publish data to IPFS/Filecoin through Textile, assign permissions signed with Metamask, pay participants for contributing to science, and securely share data with other scientists at other institutions.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/9da65b78-e0ed-47df-918e-81f63f00b48f/Opsci_Diagram.png

However, a remaining challenge for neuroscientists is archiving and sharing massive versioned datasets, particularly those that are composed of many sub-datasets being served by multiple groups across a variety of storage locations (institutional computing centers, cloud service providers, individual lab systems). Opscientia's team is currently exploring the capabilities for serving massive (250TB+) multi-scale images of the brain using decentralized file storage on Filecoin.

Project Summary

In this project, we will perform use-case testing for Filecoin as a file storage protocol serving large datasets indexed by the metadata aggregator datalad to be archived, and later visualized by researchers across an HTTPS gateway using in-browser rendering applications (i.e., neuroglancer).

Research Questions

Large neuroscience datasets require specialized workflows for breaking datasets down into storage-efficient chunks, ensuring persistent availability, validation of interoperability specifications, and managing metadata for linked resources.

The Opscientia team will test the following questions starting with small datasets for unit-tests then building up to 284TB of mixed content datasets indexed by the metadata aggregator datalad.

Storage design and optimization

Hybrid storage

Retrieval capabilities