Multi-copy Synapse Upload

Goals

Simple API to upload() either a byte array (Uint8Array) or a stream (ReadableStream or similar) up to 1 GiB
Default: one copy with trusted / endorsed SP, one copy with any other approved SP
Best-effort reliability; either through a very reliable network that needs little SDK-side mitigation, or via SDK functionality to mitigate downstream unreliability

Challenges

API must be intuitive and not place excessive burden on the developer to manage it correctly. i.e. finding the balance between the SDK’s responsibility and the SDK user’s responsibility.
Reliability presents challenges for usability where error conditions are not rare. i.e. the SDK either needs to help you deal with error conditions, or take on the burden of dealing with them.
Streaming APIs exist to minimise the memory burden in constrained environments (primarily browsers), retries handled inside of the SDK become difficult if we can’t retain (buffer) data to re-send. Buffering nullifies the benefits of providing streaming APIs.
- Some implementations will have a one-off stream to provide, e.g. filecoin-pin can efficiently provide a stream that combines filesystem read + UnixFS pack + CAR into a non-buffering implementation, but having to start the stream again is probably not a good idea because it could result in a different CommP (underlying filesystem changes between repeat operations).
Our current stack is not very stable, we don’t have high reliability metrics for our current pool of SPs.

Current Approach

Synapse’s StorageContext object represents a client’s relationship with a provider and data set. Previously either created automatically and internally with synapse.storage.upload() or explicitly with synapse.storage.createContext().
synapse.storage.createContexts() was introduced to handle the creation of multiple contexts, including the selection of an endorsed SP plus an additional SP for those contexts.
synapse.storage.upload() either takes a list of contexts that you’ve previously made, or makes contexts for you if you don’t supply them and performs an upload to both of them. Operations are performed async in parallel.
CommP calculated once and shared amongst parallel uploads.
Parallel streaming not yet implemented (this work was put on hold due to the uncertainty around this API), only Uint8Array for multi-context uploads.
Original upload() API internally returned using a Promise.allSettled() to provide a behaviour that wasn’t all-or-nothing (i.e. Promise.all() rejects if even one of them fails, making it all-or-nothing). The return type of the function is therefore: Promise<PromiseSettledResult<UploadResult>[]> where each PromiseSettledResult (see type) contains information about whether the individual upload resolved or rejected.
The caller of upload() is left to deal with the array of PromiseSettledResults, decode the values and decide what action to take as a result of the various permutations of successes and failures.
Failure handling is entirely a user concern, no affordances are offered by the SDK to mitigate error cases. One success and one failure likely means that the user is paying for one of the deals but not the other - having data stored successfully, but not meeting the two-copies minimum. Failure may not indicate actual failure to land on chain but may be caused by network issues (as experienced at FDS-7 on bad wifi) where deals concluded but this was not successfully communicated back via the SDK.
A temporary workaround was released during the November soft launch to mitigate API complexity, mismatch with documentation, and unreliability caused primarily by unreliable SPs (either Curio-induced or caused by hosting conditions):
- Reduce the default number of contexts created by synapse.storage.upload() to 1
- Use Promise.all() internally to be able to return a singular Promise<UploadResult> object from the API.

Goals

Challenges

Current Approach

Proposals