My thoughts on Block Limits. Feel free to add questions/comments
This document is currently public, and is tries to address most of the space here although things will undoubtedly be missing.
Note that some discussion is happening in https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093 around how to support large blocks when needed.
An IPLD block is a sequence of bytes that can be referenced by a CID
Current IPFS implementations recommend users create blocks ≤1 MiB and that they are able to accept and transfer blocks ≤ 2MiB.
There is no hard block limit in IPLD or specified across all IPFS implementations, although the above are generally good guidelines for the ecosystem
Why are there block limits anyway?
There are a number of things that become difficult without block limits, but one of the biggest reasons is that it reduces the risk of DoS attacks in peer to peer networks
- In peer to peer networks trust is hard to come by. For the most part we lean on cryptographic guarantees for security. This includes the hashes (and CIDs) which let us ask a totally untrusted peer on the internet to give us some bytes and yet still have confidence that that data is exactly what we expected.
- In the general case in order to be able to verify that some block has a given hash the entire block has to be downloaded. What we are looking for is to download data in a way that is incrementally verifiable.
- It’d be quite sad if you got tricked into downloading 100GB of data and only once you finished getting all 100GB could you determine that this peer tricked you and you have to throw out the data. Sure, you could try and ask someone else and never talk to that other peer again but now your ISP has decided you’re out of bandwidth for the month and so that doesn’t really help you.
What are some other reasons why having multiple smaller blocks is better than having a single large block?
- With smaller blocks it’s easier to parallelize downloads. Since we have to download the full block in order to verify it downloading partial blocks from different peers is problematic since if a block fails to validate it’s unclear which peer was lying about the data it sent to you.
- There are more possibilities for deduplication with smaller blocks than with larger ones
- It is possible to reference and download a subcomponent of some data if that’s all you need
Why it’s sad that we have block limits 😢
- Backwards compatibility with existing hashes of files
- People have been using hashes as checksums for Git commits, ISO downloads, torrents, antivirus checks, package managers, etc. for a while now and many of those hashes are of blocks of data larger than a couple of MiBs. This means you can’t reasonably do
ipfs://SomeSHA256OfAnUbuntuISOFromTheWebsite and have it just work.