Overview

Data providers, such as miners, will have local indices of their content. They want to advertise the availability of this content so that any consumers can easily find it. Data providers will also want to revoke advertisements of content that they no longer provide.

Data providers may advertise different kinds of indices, out of an enumerated set. e.g. semantic or full indices.

Indexer nodes want to discover data providers and to track updates provided by the data providers. They also want to handle incoming content requests and resolve them to providers who have that content. Indexers may reroute client requests to other indexers if they do not handle that content.

Indexer clients want to issue 'findProvider' style requests for content to an indexer node and receive a set of providers that have that content. The response should include any other information about how to choose between providers as well as information that the provider may have requested to present to them to authenticate the request? (e.g. deal ID)

Indexer nodes discover new data providers and receive notification of content updates, by receiving notifications published on a known gossip pub-sub topic. Indexers can also discover this by looking at on-chain miner activity. The notifications let indexers know that new data is available and that index entries can be fetched from an existing or a new provider. The indexer decides if and when to pull the actual index data, and which kind of index, from a provider, based on its own policies pertaining to the data provider.

To handle any amount of index data from any number of providers, the indexer nodes are capable of scaling horizontally to distribute workload over a pool of indexers. The indexer pool size can change while the pool remains online.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d0a55510-fa1d-4850-a024-be2f95f3d5a5/indexer_ecosys.png

Key Design Decisions

Data Provider Interface

Miners and other data providers will have local indices of their content. Abstractly, we can think of the resources exposed by a data provider as a tree of the following:

Catalog (List of Indexes)
Catalog/<id> (Individual Index)
Catalog/<id>/multihashes (list of multihashes in the Index)

Catalog/<id>/TBD (semantic for selection)

A provider has a Catalog of one or more indices, where the current index is the set of all multihashes known to the provider, and the other indices are past versions of it, each representing the multihashes at some previous time. Semantically, changes to the catalog of indexes by the provider are seen as happening in a globally ordered log of additions and revocations, each referencing the previous action. lndexers are able to track these changes to keep their view of a provider's Index current.