Meta

Status

2024-11-05 - This was moved to GitHub and will no longer be authored here. See https://github.com/filecoin-project/service-classes/blob/main/service-level-indicators/spark-retrieval-success-rate.md instead. As a result, this page has been locked 🔒.
2024-11-01 - Another full round of review of edits by Patrick, Steve, and others.
2024-10-22 - Second Draft from Patrick Woodhead
2024-10-16 - In progress on receiving and incorporating feedback
2024-10-14 - First draft from @Patrick Woodhead

Document Purpose

This document is intended to become the canonical resource that is referenced in the Storage Providers Market Dashboard wherever the “Spark Retrievability” graphs are shown. A reader of those graphs should be able to read this document and understand the “Spark Retrievability SLO”. The goal of this document is to explain fully and clearly “the rules of the game”. With the “game rules”, we seek to empower market participants - onramps, aggregators and Storage Providers (SPs) - to “decide how they want to play the game”.

TL;DR

FIL Spark is a proof of retrievability protocol for verifying the retrievability of unsealed data stored with Filecoin Storage Providers, for the case where that data is intended to be publicly and globally accessible. Very simply, Spark works by randomly sampling CIDs stored on Filecoin and then retrieving them. The results of whether files are retrievable or not are recorded and can be aggregated over to calculate the Spark retrieval success rate (RSR) scores.

For Filecoin, at a network-wide view, the Spark RSR score simply shows the percentage of valid retrieval attempts that succeeded. The data can then be aggregated by Storage Provider, by Allocator or by Client to show the Spark RSR scores over files linked to these entities.

We will now go through the Spark protocol in more depth to show exactly how the Spark RSR scores are created. Where needed, we will provide links to more in-depth descriptions, issues and discussions.

Spark Protocol

Deal Ingestion

The first step in the Spark protocol is to build a list of all files that should be available for “fast” retrieval. When we say “fast”, we mean that this file is stored unsealed so that it can be retrieved without needing to unseal the data first.

At least as of October 2024, each week the Spark team (Space Meridian) runs a manual deal ingestion process (Github) that scans through all recently-made storage deals in the f05 storage market actor and stores them as Eligible Deals in an off-chain Spark database, hosted by Space Meridian, the independent team that is building Spark. An Eligible Deal is the tuple (CID, Storage Provider), where the CID refers to a payload CID, as opposed to a piece CID or a deal CID. A payload CID is the root CID of some data like a file. An Eligible Deal indicates that the Storage Provider should be able to serve a fast retrieval for the payload CID.