Explaining DAGs, Roots and UnixFS to newbies

Show-me practical examples work best for many people, since we’re talking about very abstract concepts
Use existing, real-world data and tools to (a) make it real and (b) quick and efficient so you don’t have to build a corpus
Our auto-publishing websites already have a lot of what we need in place, particularly ones that auto-publish pull requests via Fleek to IPFS, so a lot of the hassle of creating and publishing data is taken care of

ipld.io as an IPFS DAG

ipld.io is managed at https://github.com/ipld/ipld and is published to https://ipld.io by https://fleek.co/, an excellent IPFS-based web publishing service.

Every commit to the master branch will result in a new version being published and pinned to the IPFS DHT and associated with the ipld.io name using IPNS / DNSLink
Every pull request to the ipld/ipld repository is built and published by Fleek on a fleek.co subdomain with the root CID in the URL. Updating a pull request will publish a new version to a new subdomain.

Exploring the ipld.io website DAG

Find the current root CID of the site—this will change with every new commit to master and will depend on the current master (i.e. it’s likely different to what’s listed below, but you could go back in time with this CID!). Use the ipfs command to look up the name as an IPNS record or look directly at the DNSLink record using dig. Either way, we’ll find a CID. In this case Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ.

$ ipfs resolve /ipns/ipld.io
/ipfs/Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ

$ dig TXT _dnslink.ipld.io
...
_dnslink.ipld-website.on.fleek.co. 2007 IN TXT  "dnslink=/ipfs/Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ"
...

Inspect the root of the site. We’ll use the ipfs dag get API to get the block we want and have it printed in DAG-JSON format. Note that the original block is actually dag-pb format which is used to construct UnixFS file data in IPFS. If we switch to the ipfs block get command and provide the CID then we’ll get the raw DAG-PB bytes, but they’re not as easy to read! Use the jq command to format the JSON document for readability. DAG-PB has two top-level properties: Data and Links. We’re interested in the DAG structure so the Links array is the most useful. Each link in DAG-PB has a name, a CID and a size. The links are named Hash (note that in DAG-JSON output, CIDs look like { "/": "Qm...." }, don’t be distracted by the {"/":..} wrapping, it just indicates this is a CID). Our root block has one named “Link” for each top-level file and directory.

$ ipfs dag get Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ | jq
{
  "Data": {
    "/": {
      "bytes": "CAE"
    }
  },
  "Links": [
    {
      "Hash": {
        "/": "QmSkLT7wKos2sqEzb5rrRZYPihhfivuoGw9oqj9fsLYm6g"
      },
      "Name": "FAQ",
      "Tsize": 14503
    },
    {
      "Hash": {
        "/": "QmVWdcKiK2mWpDXQj9DtHsrBqzA4hSLxov5juvaS4wrCZD"
      },
      "Name": "css",
      "Tsize": 85864
    },
...
    {
      "Hash": {
        "/": "QmXbobtcHFExm3XF73rqBjhzrZ5USeu3rpHzm8DDoKL5cp"
      },
      "Name": "docs",
      "Tsize": 1152555
    },
...
		{
		  "Hash": {
		    "/": "Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe"
		  },
		  "Name": "index.html",
		  "Tsize": 26464
		},
...

Load a page. When a client (in the case of a website—a browser, via an IPFS gateway or an in-built IPFS client) loads a root like this and doesn’t find a page, it will look for an index.html link, which ipld.io has. In this case, the Links field is empty, but the Data field contains a lot of bytes (in DAG-JSON, a byte array is represented as { "/": { "bytes": "<base64 unpadded byte string>" } }, the actual bytes are the Base64 unpadded block).

$ ipfs dag get Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ/index.html | jq
{
  "Data": {
    "/": {
      "bytes": "CAIS0s4BPCFET0NUWVBF...
...
    }
  },
  "Links": []
}

IPFS vs IPLD pathing Looking back at the original root block contents in step 2, we can see that the index.html page has its own CID, Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe. So we are already navigating a basic DAG: Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ (ipld.io ROOT) → Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe (index.html) When we run ipfs dag get Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ/index.html, we are asking ipfs to perform a navigation to a named link index.html for us. Internally it reads the DAG-PB block, looks for a Links array element with a Name field of index.html and loads the block with the CID in the Hash field of that link. We can also just fetch the block directly by using the CID of the index.html block, note that the result is exactly the same as pathing via the root CID with an /index.html path attached to it.

$ ipfs dag get Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ | jq
... 
		{
		  "Hash": {
		    "/": "Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe"
		  },
		  "Name": "index.html",
		  "Tsize": 26464
		},
...
$ ipfs dag get Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe | jq
{
  "Data": {
    "/": {
      "bytes": "CAIS0s4BPCFET0NUWVBF...
...
    }
  },
  "Links": []
}

Navigate with IPLD pathing DAG-PB are a special case within ipfs, when supplying a path attached to a CID, it will interpret the blocks and look for named links for us. This isn’t the case for blocks of any other codec (e.g. DAG-CBOR). We can switch out of this special-case mode and explicitly say that we want to use raw-IPLD pathing by prefixing our root block CID with /ipld/, and then we tell ipfs to not bother doing any special interpretation of a DAG-PB block using named links, but we can path through the DAG-PB’s blocks properties for ourselves. To path within a block, we supply the properties of maps that we want to traverse into, and for arrays we supply the index of the element we want to traverse into. For our index.html element of the root block’s Links array, we can see from the original ipfs dag get output that it’s the 8th element, we address this with a path of /Links/7. In this case we’re simply plucking a small element out of the root block, but we can see it’s the element in the Links array that we care about.

$ ipfs dag get /ipld/Qmb2TK3N6M2SQj3JaLJhGWPcpmtyvuHhZdSMADMGrLnpnQ/Links/7 | jq
{
  "Hash": {
    "/": "Qme24pFfR4bhBUkZuUdAGjC1onUnZyZHdPGXu2bLDg7Uxe"
  },
  "Name": "index.html",
  "Tsize": 26464
}

Navigating deeper with IPLD pathing When a path hits a link (CID) in a block, ipfs will transparently load that link and traverse it as if it were part of the original block. This happens recursively down a DAG as deep as your path happens to go, with ipfs fetching, decoding and traversing each block in the path until your end-point. In this case, we’re going to navigate into the Hash which is a link and will load the block with that CID and then into the Data field of that block.