https://www.youtube.com/watch?v=bXaL64fp55c

Technical Design

Motivation

Large content providers don’t publish Provider Records to the DHT because the process is too resource consuming for large scale publication. This implies that this content isn’t discoverable on the DHT at all and must be discovered using Bitswap broadcast, which is terrible for many reasons. Enabling large Content Providers to publish their content on the DHT is necessary before turning off the Bitswap broadcasting feature, otherwise some content cannot be found. Turning off Bitswap broadcast is a major milestone to making IPFS more resource efficient. It will help all IPFS peers, but especially large content providers to reduce their bandwidth bill, as they won’t be spammed anymore by Bitswap. Hence, all the IPFS ecosystem would benefit from the large content providers publishing to the DHT.

One easy improvement to make the DHT Provide operation MUCH cheaper is to add a ProvideMultiple RPC.

Current (Re)Provide

go-ipfs-provider encapsulates some of the logic. Then kubo perdiodically calls the Provide method exposed by go-libp2p-routing-helpers.

Proposed change for reprovide

Move reprovide logicfor the DHT to go-libp2p-kad-dht. kad-dht should expose an interface to kubo to add keys to reprovide and to remove keys that don’t need to be reprovided anymore. kubo doesn’t need to manage republish itself. It can pass some parameters to kad-dht concerning the reprovide strategy.

IPNI doesn’t need reprovide by design. Hence the reprovide strategy should be Content Router Specific, and managed by the Content Router.

go-libp2p-routing-helpers must expose a Reprovide API (e.g reprovide tracker add and remove). The Content Routers should manage the reprovide by themselves, and possibly accept a reprovide strategy passed down from kubo.

what kubo should do

create the DHT as content router

when a new CID is added to kubo and should be provided, call contentrouter.StartProviding(CID).

The DHT manages all the rest.

when kubo wants to stop providing some content, it calls contentrouter.StopProviding(CID).

DHT ReprovideSweep Design

All keys located in the same keyspace region are reprovided all at once. As some large Content Providers are publishing more CIDs than there are DHT Servers, by the pigeonhole principle there must be DHT Servers that are allocated more than one Provider Record, by this Content Provider. The primary rationale is to (Re)Provide all Provider Records allocated to the same DHT Server at once.

As sending multiple Provider Records requires a new RPC, causing a breaking change it isn’t trivial to send all Provider Records exactly at once. However, the most expensive part in a (Re)Provide operation is the DHT walk to discover the right DHT Servers to store the Provider Records on, and opening a new connections to these peers. Once these peers are known, and a connection is already open, the Content Provider can simply reuse the same connection to send multiple individual Provide requests.

The DHT implementation go-libp2p-kad-dht must keep track of the CIDs that must be republished, every Interval (let’s assume that all Provider Records are republished at the same frequency). The Kademlia identifiers of the CIDs to republish, must be arranged in a binary trie to allow a faster access. As each Provider Record is replicated on 20 different DHT Servers, 20 DHT Servers in a close locality are expected to store the same Provider Records (in reality not exactly, see Advanced Design for a precise explanation). The Content Provider will continuously lookups keys in the keyspace, from left to right, hence sweeping the keyspace. For each requested key, it will find the 20 closest peers, and lookup in its CIDs Republish Binary Trie all Provider Records that would belong